Master of Arts In Mathematics

Size: px
Start display at page:

Download "Master of Arts In Mathematics"

Transcription

1 SIMULATION MODEL USING STANDARDIZED LINEUP TO EVALUATE PLAYER OFFENSIVE VALUE A s MfVlVl A thesis presented to the faculty of San Francisco State University * In partial fulfilment of The Requirements for The Degree Master of Arts In Mathematics by Eugene Beyder San Francisco, California June 2015

2 Copyright by Eugene Beyder

3 CERTIFICATION OF APPROVAL I certify that I have read SIMULATION MODEL USING STANDARD IZED LINEUP TO EVALUATE PLAYER OFFENSIVE VALUE by Eugene Beyder and that in my opinion this work meets the criteria for approving a thesis submitted in partial fulfillment of the requirements for the degree: Master of Arts in Mathematicsat San Francisco State University. Professor of Mathematics Associate Professor of Mathematics

4 SIMULATION MODEL USING STANDARDIZED LINEUP TO EVALUATE PLAYER OFFENSIVE VALUE Eugene Beyder San Francisco State University 2015 Baseball is a sport in which batting statistics are commonly used to assess the offensive value of a player. However, many traditional statistics do not accurately portray a player s true contribution to his team since they overlook a variety of circumstances outside of the hitter s control, in particular his position in the batting order. Using a standardized lineup, we have built a simulation based on Markov chains in which a player is evaluated by his offensive contribution in different batting positions in the lineup and then compared to other players in the league. Thus, this model is able to solely evaluate a player s offensive skill set and distinguish which player has a greater value to his team among players with similar traditional offensive statistics. This model reveals several strategies perhaps not yet explored and can be used by major league baseball teams when making various decisions such as signing free agents, and setting offensive lineups. I certify that the Abstract is a correct representation of the content of this thesis. Chair, Thesis Committee u

5 ACKNOWLEDGMENTS I want to thank my advisors for their support and guidance. I am extremely thankful for the ability to merge together two of my favorite topics, math and baseball. Of course I want to thank my family for the constant support and encouragement they have provided me through out my life. v

6 TABLE OF CONTENTS 1 Introduction Background on B a se b a ll M otivation Purpose of T h esis Theory Behind Sim ulation B ackground Markov C h a in Transitional M a trix Data Collection Collecting Transitional M a trix Individual Player Transitional Matrix Average Player Transitional M a trix Runs M atrix Building Simulation M odel How to Simulate a Single G a m e How the Model Will S im u late Distribution and Ranking of Each Batter P osition Process for Finding Random Players...31 vi

7 5.2 Testing Data Finding a Players Rank by Batting P o sitio n Results of the M o d e l Conclusion and Future W ork...46 A p p endix...49 Bibliography...59 vii

8 LIST OF TABLES Table Page 1.1 percent of plays during seasons, where WP is wild pitch, PB is passed ball, and errors are errors on pick off a tte m p ts Examples of Code to represent states of system Example of a single game sim u la tio n Standardized lineup for m o d e l Shapiro-wilk test for 30 random players in 1st batting position of standardized lin eu p Normality test for batting positions Random 30 players runs per game for batting positions 1-5 of standardized lin e u p Explanation of equation used for finding ranking Rankings of top players in M L B Rankings of players with similar PA- Plate Appearances and BA- Batting Average Rankings of players with similar R s and RBI s viii

9 LIST OF FIGURES Figure Page 3.1 Examle of Transitional M atrix histogram and qqnormal plot for 30 random players runs per game in lead-off position of standardized lineup histogram and qqnormal plot for 30 random players runs per game in second position of standardized lineup histogram and qqnormal plot for 30 random players runs per game in third position of standardized lineup histogram and qqnormal plot for 30 random players runs per game in fourth position of standardized lineup histogram and qqnormal plot for 30 random players runs per game in fifth position of standardized lineup... 40

10 1 Chapter 1 Introduction 1.1 Background on Baseball Baseball is a game played between two teams, a home team and a visiting team. A typical game of baseball consists of nine innings, where each inning is broken up into two half innings. The top half of the inning is reserved for the visiting team to play offense while the home team plays defense, and in the bottom half of the inning the roles are reversed. Each team s offensive consists of a 9-batter lineup that must take turns in that exact designated order for the entirety of the game; this order is set before the game starts and each time a player takes his turn it is called an atbat, denoted (AB).[6j A half inning ends when the defensive team has acquired 3 outs. The goal for the offense is to score as many times as possible before the defense gets three outs. An out can be made many ways depending on the situation of runners on base. Each new inning begins with the batter who follows the player

11 2 responsible for hitting into the third out. Each atbat results in either; an out, a batter occupying any one of the three bases, or a scoring scenario. At the end of nine innings, scoring plays known as runs determine the winning team. To score a run, a player must advance in order from 1st base to 2nd base through 3rd base and finally reach home plate safely. 1.2 Motivation Typical offensive statistics in baseball can be found everywhere from the back of players baseball cards to websites known as baseball reference pages. Most of these statistics can be classified as counting statistics, since they are determined simply by counting successes vs. failures. The most popular counting statistics used are hits, runs, and runs batted in. Hits denoted as (H),[6] is the number of times the batter reached base as a result of hitting the ball with no outs made on the play and no mistake from the defense. Runs, denoted as (R)[6] represent the number of times the batter passed all three bases and reached home plate safely to score. Runs batted in, denoted as (R B I),[6] is the number of runners that scored attributed to the batter as a result of him hitting the ball. These statistics serve as a very quick comparison between player A and player B. This type of comparison is sometimes all that is considered by not only the casual fan, but many top baseball broadcasters and commentators.

12 However, judging a player based on solely counted statistics does not fully represent his value offensively. This is due to some aspects of baseball that are out of the player s control. Counted statistics like R and RBIs have more to do with the success of those who bat before and after the player in question. For instance, to get a R, a player must reach base and pass all three bases and safely reach home plate. But if Player A is consistently reaching base safely but the players after him are not successful at scoring him, then Player A does not get a R credited to him. On the other hand, if the players in the batting order hitting behind Player B, who is reaching base safely at the same rate as Player A, are successful at scoring him then Player B will get credit for a R. Thus, when comparing Player A and Player B, Player B will have more runs, though he may not necessarily be a better offensive player. A similar argument can be made for RBIs. This category is based on how many times a run scores as a result of a player s AB. The number of RBIs is predicated on the number of opportunities the batter has with runners on base. Player A, a below average hitter, could have many opportunities for a RBI, succeeding at a very poor rate. Meanwhile, Player B, an above average hitter, may have far less chances for a RBI but succeed at a much higher rate, and end the season with the same number of RBIs as Player A. Thus, counting statistics such as R and RBI make it difficult to accurately compare players with similar statistics. 3

13 4 Percentage statistics offer greater insight to the quality of the player by looking at how successful the player performs in different situations. A typical percentage statistic is batting average, denoted as (BA) [6] which is the success rate of getting a hit. Batting average is calculated by dividing a batters number of hits by number of at bats, BA = H/AB. But even percentage statistics like BA, while telling us how well the player performed individually, still need to be evaluated with the impact of players around him. For example if the player is on a poor performing team, the opponent will try to avoid the good player making a big impact whenever possible. This means that the player will have fewer opportunities resulting in deceptive statistics. The main point that these cases illustrate is that judging a player based on counting statistics and percentage statistics does not provide enough information to get a comprehensive assessment of a player s value offensively. Each player has unique circumstances surrounding each AB that play a role in his offensive production, and inherently his offensive statistics. In order to attain a better analysis of a player s offensive value, we need a platform in which all players are evaluated with the same surrounding circumstances. Many algorithms and methods for evaluating players beyond typical statistics already exist, most of them deal with predicting individual player s statistics that may not reflect the true value of a player. Fantasy baseball[5] is a great example of players being evaluated and judged based on the numbers they produced and predicted

14 5 to produce for the remainder of the season. A more valuable and insightful statistic to judge a player is his overall contribution to his team winning. Statistics of this nature have already been examined, the most well known is a saber metric statistic called WAR,[1] which measures wins above replacement. This statistic incorporates how a player performs defensively and offensively, including base-running and hitting, and produces a number for how many more wins the team will have during the season if this player played in every game. Others have come to a similar realization that alternative methods of analysis need to be performed. A Markov Chain Model of run-scoring in baseball has been used by several researchers to model the progression of a half-inning of baseball. R.E. Trueman in Analysis of baseball as a Markov process defined a half-inning as a 25-state Markov chain in his analysis. [14] In fact, many statisticians have used the Markov Chain Model to simulate an entire game. For example, in Markov Chain Approach to Baseball, Bruce Bukiet, Elliotte Rusty Harold and Jos Luis Palacios applied a Markov Chain Model to determine optimal batting orders. [9] By developing algorithms to find the optimal out of 9! possible orders, they created a method to find the runs distribution per half inning and per entire game using a 25-state Markov chain. In addition, they determined the expected number of games a team should win in an entire 162 game season. Meanwhile, Nobuyoshi Hirotsu and Mike Wright in A Markov Chain Approach To Optimal Pinch Hitting

15 6 Strategies In a Designated Hitter Rule Baseball Game applied a Markov chain model to optimize the pinch hitting strategy. [10] This type of analysis allowed them to determine whether it was optimal to use a pinch hitter with a high likelihood of hitting a home run but also a high likelihood of making an out instead of a batter with a low likelihood of a home run but also a low likelihood of making an out. 1.3 Purpose of Thesis For my analysis, we are attempting to determine a player s contribution to his team by only considering his impact offensively. In baseball there are two types of plays, batted plays and non batted plays. Non batted plays are primarily plays that involve base runners advancing during atbats.[ll] They are classified as: steal attempts, balks, wild pitches, passed balls and errors on pick off attempts. Batted plays are defined as those not included in non batted plays, such as hits and walks. As evidenced by Table 1.1, non batted plays have only accounted for about 3% of all plays in the last three seasons. [7] Thus non batted plays are extremely unlikely to occur and are inherently unpredictable, and as a result are difficult to model. For the purpose of my thesis, we will disregard the effects of non-batted plays in a player s offensive production and solely focus on batted plays. As we have demonstrated, there are various circumstances that impact a player s

16 7 year % non batted plays % steal attem pt plays % Balk, W P, PB, errors % 2.3 % 1.1 % % 1.9 % 1.2% % 2.0 % 1.2 % Table 1.1: percent of plays during seasons, where WP is wild pitch, PB is passed ball, and errors are errors on pick off attempts offensive success during each AB. Furthermore, the traditional statistics used to assess a player offensively do not take into account these factors. Therefore, many players are not being valued properly for their offensive skills. For example, oftentimes a player s offensive production is determined by their position in the lineup. We want to create a method which compares players offensive contributions in the same surrounding circumstances to allow for a more accurate comparison. We will only take into account batted plays to eliminate many of the irregularities of non batted plays. This will enable me to solely measure a player s offensive capabilities and as a result his contribution to the number of runs per game his team scores. Thus, the goal of this thesis is to develop a statistical method using a Markov Chain Model to compare a player s offensive skill-set for a given slot in the batting order across the entire league.

17 8 Chapter 2 Theory Behind Simulation 2.1 Background Analyzing data is a very prevalent aspect to all fields of study. It indicates what has already happened and provides information on what will potentially happen in the future. In order to have worthwhile data analysis, a large sample size needs to be collected. Using the collected data we can model future success based on previous results. Simulations serve as a process that enables the user to imitate situations of real world systems over an extended period of time. In order to perform a simulation, there must first be a developed model, which represents the key characteristics or functions of the system at hand. The model represents the system itself, whereas the simulation represents the operation of the system over time. Simulations are also used when the real system is unavailable, for reasons such as the system is not accessible to use, or it is being designed but not yet built, or simply it does not

18 9 exist. There are many types of simulations that are run today, such as the Monte Carlo simulation, which enables one to model situations that present uncertainty and play them out thousands of times on a computer. These simulations allow one to better estimate the probability and likelihood of events being successful. From this data, one can draw more accurate conclusions about the information at hand, and better predict future events. In terms of baseball, one can use simulations to create a platform in which every player is placed in the same situation and analyzed based on how well he performs in a given situation. We will apply this knowledge of simulations to studying baseball statistics. In the game of baseball every event and situation is charted, recorded and can be expressed through data sets. In particular we can examine the statistics that happen throughout the course of a game, season and career of a player or a team. This access to information allows us to use simulations to gain insight into which strategies are most effective in certain situations that may arise throughout the course of a game. 2.2 Markov Chain A Markov chain is a probability model that is used to characterize movement from different locations called states. A Markov chain is made up of absorbing states and transitional states. Transitional states are states that can move to other states. Absorbing states are final locations; once in an absorbing state, movement is not

19 10 allowable to other states. [12] [13] [11] Formally, consider a set of random variables X\, X2,..., X n where X t for i 1,..., n represents the state of the system at time i, for the possible states of the system 1, There exists a set of numbers P^ where both i and j range from 1,...,n, such that when the process is in state i, then the transition to the next state j will have a probability of Pty Pij = Pr(Xn+1 = j\x n = i) Thus the set {Xt \ i e 1,..., n} is called a Markov chain with transitional probabilities Pij. Where Ptj satisfies the following conditions: for all combinations of i and j. And, 0 < Pij < 1 n where i = 1,..., n since the process must transition to some state j once it leaves state i. We are in an absorbing state if and only if Pu = 1

20 11 and Pij 0 where i ^ j, otherwise we are in a transitional state. [12] To use a Markov chain some restrictions have to be met. First, there needs to be a finite number of outcomes or states. Second, the probability for each possible state that the process initially occupies is known. And finally, the probability of a given state can only depend on the previous state and not on the events leading up to that state. If a sequence of states has the Markov property, then every future state is conditionally independent of every prior state. In essence this discredits all notions of momentum being a factor in effecting outcomes for our situation. [11] [12] In order to model a baseball game we need to establish the variables that will affect our system. Since the game is not measured by a clock but rather play by play, the nature of baseball sets up very nicely to study a discrete situation. We want to look at each atbat, in particular the situation before the atbat takes place and the result of the atbat. Thus we need to know the probability of generating a certain result which will vary from player to player. Also, we need to know the positioning of the runners on base along with the number of outs prior to the atbat, and the positioning of the runners and number of outs after the atbat. These types of situations will be known as our states. A state is viewed as a description of the

21 12 runners on base and the number of outs in an inning. [11] There are 3 possible bases that can be occupied at any given atbat, thus this results in 23 or 8 possible ways that runners can occupy the three bases. At the same time there can also be 0,1, or 2 outs for each of the 8 occupying base situations, resulting in 24 possible situations that can be present when a player comes to bat. Once the third out is made the inning is over; this will be our absorbing state in our Markov chain. [13] To prepare our data for simulation we need to first have a way to represent the 24 transition states. Each state is coded as XXX Y where the X s going from left to right represent 1st, 2nd and 3rd base, and can take on the value of 0 to represent an empty base, or 1 to represent an occupied base. Y can take on the values 0,1 or 2 to represent the number of outs in the current state. The absorbing state, will be coded as 3 to represent three outs. Table 2.1 shows a few examples of different coded states and their representations. Code to represent State State of Runners and Outs XXX Y 1st base, 2nd base, 3rd base, Number of Outs No runners on base and no out Runner on 1st base and 1 out Runners on 1st and 2nd base and 2 out Runner on 3rd base and no out Table 2.1: Examples of Code to represent states of system

22 Transitional Matrix In order to keep track of the movements from state to state a transitional probability matrix is created. The transitional matrix in my system will be a 24 x 25 matrix where the elements of the matrix represent each possible transition from state to state. Each inning begins with a new Markov chain with the starting state always being There are 24 base out states that we can start any given atbat with, and transition to the same 24 base out states plus the 3 outs state. Once we transition to the 3 outs state the inning is over and our Markov chain is broken. Every element in the matrix represents the probability of transitioning from state i to state j, hence the (i,j)th element of the matrix will be.[12] A transition from state i to state j exists only if Pij > 0 Here are situations when the P ^ s take on a value of 0: The number of outs decreases. A transition of this variety for example from a one out state to a zero out state can not happen because the number of outs in an inning of baseball game can only stay the same or increase after a play. The number of base runners added to the state increased by more

23 14 than one. A transition of this variety for example from to can not happen because only one base runner can be added to the state per atbat, since only one atbat takes place between states. Base runners went back to previous bases. A transition of this variety for example from to can not happen because in baseball base runners are not allowed to retreat to a base they already passed. No transitions occurred from state i to state j for that transitional matrix. The transition does not violate any restrictions and our system allows for such a transition, but the player did not come to bat when the state was i.

24 15 Chapter 3 Data Collection 3.1 Collecting Transitional Matrix We will be using data from the seasons for my project. We chose the last three seasons because it provides a large enough sample size of sufficient data for my analysis. We did not want to use data for more years because we want the data to still model the players current skill set. Taking a larger data set will devalue the analysis of the players worth today, as a players skill set changes over time. To acquire a transitional matrix we must first download data from [7] for the particular season desired. Specifically we download a roster of all players for the particular season, a file with all plays that took place for that particular season, and a file that represents the names of all the different categories that the data in the season can be classified as. Next we want to take a snap shot look at each play that occurred

25 16 during that season. In particular we want to record the occupancy of the bases and the number of outs prior to the play occurring, as well as the location of runners on base and number of outs after the play has occurred. We code this information as discussed before, and denote them as our states. Next, since we are concerned with batting plays only for our model, we want to extract out all plays that advance runners via non batted plays. Thus plays such as steal attempts, wild pitches, passed balls, and errors trying to pick off a base runner are removed from the data set. We can accomplish this by using a variable called BAT_EVENT_FL.[11] This variable classifies all plays that are batted plays into one category. Taking a subset from our data set of all plays that have BAT_EVENT_FL equal to TRUE accomplishes the desired result. With the method described earlier to get the location of the base runners and number of outs, we come across a small issue when the outs are equal to 3. The method records the location of base runners when 3 outs are made. Thus there are 8 states that have 3 outs: ( 000 3, 100 3, 010 3, 001 3, 110 3, 011 3, 101 3, ). For our model the location of runners when 3 outs are made is irrelevant. Therefore we will recode all of these 8 events with 3 outs simply as just 3. [3] Now we have 24 states that any atbat can start at and 25 states that the atbat can transition to. The 25 ending states include the same 24 starting states plus the 3 state. Using the table[3] function in R organizes our data set into a 24 X 25 matrix called T, where the first vertical column represents the 24 starting states

26 17 before an atbat, and the first horizontal row represents the 25 ending states after an atbat. Every entry in the matrix (Ty) represents the number of transition from the starting state i to the ending state j. Note, the values in the matrix are not percentages, but rather whole numbers since the table function only collects the data and organizes the information in matrix form. The data in the matrix represents the number of times in a particular season transitions from the starting state i to the ending state j occurred. To find the probabilities of transitioning from state i to state j we use the prop.table function in R. This function takes every T\j from the T matrix and divides them by the sum of the row that the entry occupies to find Pij for our desired transitional matrix. p.. Z21 T ^v25 rri 2^j=1 ij Thus we now have a method for acquiring a transitional matrix for a particular baseball season. 3.2 Individual Player Transitional Matrix The previous method finds a matrix that contains the transitions for all players in a particular baseball season. Now, we can find the transitional matrix for individual players. To acquire an individual player s matrix we need to extract all the atbats

27 18 that are not taken by our desired player. To do this we need to arrange our data where BAT_EVENT_FL is equal to TRUE into a three dimensional matrix. The three categories will be the starting states, ending states and BAT_ID. Each player has a unique BAT_ID [7] [11] that is made up of the first four letters of the player s last name, followed by first letter of the first name and then a three digit number. For example, Albert Puljos BAT_ID is pujoaool. The roster file contains a list of all players for that year and thebat_id for those players. A player s BAT_ID is assigned on the first official atbat of the players career, and never changes from season to season. Our three dimension matrix can be put together by once again using the table function in R for the three variables BAT_ID, starting state and ending state. This three dimensional matrix contains a two dimensional matrix for each BAT_ID, where the variable are starting states and ending states. To get a matrix for a particular player, we just use the players BAT_ID to pick out the corresponding two dimensional matrix. This matrix represents the number of times the player transitioned from state i to state j. To find the probabilities, we again use the prop.table function in R. Thus we now have a method for acquiring a transitional matrix for a player in any baseball season. Since we will be using seasons as my data, we need to acquire a transitional matrix for a player for those three years. To accomplish this we need to merge together the matrices for each individual year. We gather

28 19 the matrices by the process laid out above, however we do not want to change the entries in the matrices to probabilities yet. We can add the three matrices together by using matrix addition in R. This creates a matrix that represents the number of transitions from starting state i to ending state j for a player for the years Now we are ready to change the matrix to represent the probabilities of transitioning from state i to state j with the prop.table function. We repeat this process for gathering transitional matrices for any player desired. Figure 3.1 displays an example of a transitional matrix. m t #902 mu ffil; :0t 8 m i W i «1 S m i i mu 19&1 19S? ;mn m i : I W ii * m i Jie a m e m i m i % *.«7 7 *.* m, 6B #, S e. «e, * 8.e e 8«* * * * m 8,060 0,08, * *.* <tmt 6-62S 6.S « e.e*? * m t m 2 6,68 m i e 0, 3i,2* *.*B*».* * #< ee 9.9m e.esse e,«e ««, ee.j7s *. 84 *. 8*.ai4 8.*86 *.*08 *,«8 * *8 8,8*0 8,68,«8 «M l I «, «*. 2S.2i* «.* *.33& m 9.9 m S> 8«. S a«a 8.*08 *.*08 * , «. «961 2 f, &e g. e e.n $ , m *8 *.«e 9.9 m « : t.a s * * *,8 2 *18 a *,*08 * «* i$ *«*8* 8,88 9,999, m, e «. sa *.431 *.8 8 f.s 4?.«e *.* 8 9,9 U #,* m 8,073 8, e 9,9 m S.iSS 6.8* *08 *.* *.«2* *.88 8,86 9, 9m, 8. e«8.85s 8.8S 8.* m *08 t u $ 6.«ea * *.J4 * 9. m, 4s e. m.8*2 8.81* 8.3S* 8.80«S m i* ,532 8, i *.* w,6t* s -aa«e< 68 e.^6 6, m.*ee m 9, m *,«a #,*68 9, m. 8.6JJ , e, 8 e, 8. i8 8.8 * 8.8 * 8.* * 8,*88 e.s&e 9, 9m 8, 8 «, 66, e«. 6 « *31 *.8*8,8 8 8,880 8,880 8,08 0,2* «2S e.se*.il4 e.8 s *.* <6iS «x8«e «, e 8.0 e.e»8 «. «* *2 8.* * *8 9. 9m ,0@«8,9 m 360 % S. 2* e.eej. «.«@ i? S8 9. 9m 0.? **? «.*»? S 8, $ 8, *,8 e «,*2S e, m 8.860,«e?,68, ee «. i4 e,«sa «,*? «*.««1 9,9 m 9,9 m «, S «6,68 «, «6.1SS * * 8«* m *.« , 2, e?, a«e.«7» *. es 8.83J 8.8*S <«e 8,608 *.1*3 8,8 8,834 \ 9,2 *7 9,9m 9,IS 4 9,929 « *08 8<*98 m a « S *.8«9. 9m, 0, 6a, e. a* 8.8* * $ i 8.9 m.i4s m t 6.6 s «.«* * m 6,62$. e e *, S3 *. * «,«2 #.035 e,a^3 «, e « * ,^ 5 *,*08 *.*3* *,**<* ?0 8>690 0,68 ii a : 8, 88» 2S, e i.8? 8.* 2 *.* e «, s$ e.m* «,«3 3 «. rs, 6B.S*8 8.8** 8.80* : *.a 4 *.* 8 6,^ 4 8.3S3,8 e #.i* s e.ee ii 2, *B > 8«. 25 : * < A 8,»8 2«, 8. 0* 8.80S! ^4 *.* i^ * : :.il3 its # a.901 S.6&2 : : % ~m *>.M5 0-3?.86t.8? ; * * * & ' i 6.8 t9! ? 6.6S t u 0.3S2 ; $.<tm :m 1 *,*0e #,«2 «>«* *.88 8,8*7 8, 62 *» t? e.«s *.?e*?,* «s 6,8 8,8«3 8.8S 8, 3i! «,?f 6, ,a?a *,3 i* m 2 *.8 8 *<8 8 * * 8, 8!, 67 8««. 2S 8.8 * 9. t m *.*58 *<*08 *« ,880 8,080, 4i *? *.*06 j *.12* Figure 3.1: Examle of Transitional Matrix

29 Average Player Transitional Matrix For my model we will be building a lineup where each batter is the average MLB batter for that specific position in the batting order. To clarify further, the average hitter in the lead-off spot is found by taking all the players that hit first in the batting order during the season and gathering all the transitions that took place from starting state i to ending state j while they were in the first position of the batting order. This can be accomplished very much like the process in the previous section. Again, we want to arrange our data where BAT_EVENT_FL is equal to TRUE into a three dimensional matrix. The three categories will be the starting states, ending states and BAT_LINEUP_ID.[7][11] Since our data contains all the plays that occurred during a particular season, by using BAT_EVENT_FL equal to TRUE we are only looking at the plays that occurred via an atbat. Thus, for each play the data set keeps track of which player was atbat and which position in the lineup they were batting when that play took place. There are nine possible batting positions that any player can be designated to. The variable that keeps track of this batting position is BAT_LINEUP_ID. Our three dimension matrix can be put together by once again using the table function in R for the three variables BAT_LINEUP_ID, starting state and ending state. This three dimensional matrix contains a two dimensional matrix for each BAT_LINEUP_ID, where the variables are starting states and ending states. To get a matrix for a

30 21 particular batting position, we just use the batting position number to pick out the matrix. This matrix representing the number of times a transition occurred from starting statei to ending state j for all players who batted in that particular batting position. To find the probabilities, we need to use the prop.table function in R. For my model we will use seasons to acquire my transitional matrices for the average player in each batting position. We chose to use one additional season for the average player transitional matrices because the average player transitional matrix is the foundation of my model. We wanted to make sure the data was sufficient at modeling the typical player for each batting position. To get a transitional matrix for those four years we will need to merge together the matrices for each individual year. Adding the matrices together by using matrix addition creates a matrix that represents the number of transitions from starting states to ending states for a batting position for the years Now, we change the matrix to represent the probabilities of transitioning from starting states to ending states with the prop.table function. Finally we repeat this process for the remaining batting positions in the lineup. 3.4 Runs Matrix Now that we have a method for finding transitional matrices our Markov process is complete, however we are not ready to begin building our simulation yet. We

31 22 still need a way to know the number of runs scored in all possible transitions from starting state to ending states. However, there are a few restrictions. First when three outs are made the inning is over and no runs score. There is a possibility that the third out was made on the bases after a run had already scored. However this situation is quite rare and extremely unpredictable to model. Second, runs may score only on plays where the batter has an atbat; thus steal attempts, pick off attempts, balks, passed balls or wild pitches cannot result in a run in this model. From earlier in the paper we learned that many transitions will have a probability of 0. Clearly these transitions will also have a runs scored value of 0. To find the number of runs scored for transitions that have a probability greater than 0 we must simply find the difference between total number of base runners and number of outs in the starting state versus the ending state, and this will show how the runners in the state have transitioned. We also need to account for the batter, thus we just add one to our total. We can write this idea in equation form: Then, If Pi:j > 0 R ij = (baserunnersi + outsj) (baserunnersj + outsj) + 1 Otherwise = 0. Where baserunners.; represents the number of base runners

32 23 in state i, outs* represents the number of outs in state i, and Rij represents the number of runs scored when transitioning from starting state i to ending state j for the Runs Matrix i?.[ll][13] If we applied the equation to get all the entries of the Runs Matrix we would notice some transitions from state i to state j have a negative value yet the lowest number of runs that can be scored on any play is 0. But in all of the transitions that have negative runs scored, the probability of those transitions is 0. Many of these situations occur because the difference in the number of runners on base and the total number of outs from state i to state j went up by more than one. In any situation the amount of players added to a transition can only go up by one. If we look at an example where our starting state i was and the ending state j was 110 0, then using the equation we get that Rij = 1. Another issue occurs for cases where Pl3 = 0 but Rij > 0. To account for this we can fill the cells that have a probability of 0 with a runs scored value of 0, before using the equation to find the remaining R^j s. Our finished product is a 24 X 24 Runs Matrix called R, with entries Rij for each starting state i and each ending state j.

33 24 Chapter 4 Building Simulation Model 4.1 How to Simulate a Single Game After establishing the transitional matrices and the Runs matrix, we are ready to build our simulation model. To start we want to model how to simulate a half inning of a baseball game. In order to do this, we need to know the variables that affect our model. The variables will be the Runs matrix and the batting lineup. The Runs matrix tells us how many runs were scored on a particular transition. The batting lineup tells us the order that the players are batting; however, instead of the player s names we use the player s transitional matrices. To denote the transitional matrices of the different batting positions in the lineup we will use Ax where x G 1,..., 9 and x represents the position in the lineup. During the simulation we will want to keep track of the number of runs scored throughout the game, and have our final output from the simulation be the total runs scored in the game. To do this we will set

34 25 runs = 0 at the beginning of the simulation, and throughout the simulation after each transition from state i to state j check the Runs matrix for the number of runs scored on the play. Once the number of runs scored on the play is found, we add that amount to the total runs scored for the game. Since every player in the batting lineup must bat in order, we need a way to make sure that this order is maintained. To account for this, we used a counter variable x that represents the batter position in the lineup. To begin the game x = 1 to represent the 1st batter. After each transition has occurred the counter variable goes up by one: x = x + 1. When the counter variable passes 9 it returns back to 1 and continues. Every inning has the same starting state, i = 000 0,[13] the no runners on base and no outs state. The half inning simulation begins by examining Ax the transitional matrix of the batter in the x position of the lineup. Then, we pick out the row from Ax that has starting state i: Ax[i,\. From that row we pick out only the ending states Ax[,j] whose Pij > 0. The remaining entries represent the probabilities of transitioning from our starting state i to the possible ending states j. Based on these probabilities we randomly select one entry, which now becomes our ending state j where the transition for the batter is represented by Axij. We now check the number of runs scored from this transition by looking at and adding this value to our total runs for the game. runs = (Rij + runs)

35 26 After this we set our ending state j to be our new starting state i and repeat the process for the next batter by setting x {x + 1). This process will go on forever unless we have a way to terminate it. For our model, an inning should end once three outs have been acquired, and thus when the state becomes 3 the inning is over. To have our model account for this we just check the ending state after each transition. This method is accomplished with a while loop. A while loop repeatedly follows a process and only stops the loop once it encounters a case that violates the desired condition. When our models ending state becomes 3 our condition is violated and the loop terminates. We want our model to simulate a half inning of a baseball game 9 times, where an inning ends with the number x batter and the next inning begins with the number (x + 1) batter. We can accomplish this task with a for loop. A for loop is a function that performs a process a designated amount of times. Our model will run a while loop inside of a for loop for the innings one through nine and will produce an output of the number of runs scored in the game. Table 4.1 shows an example of a simulated game where the states after each transition are shown along with an interpretation of the transitions. The transitions in Table 4.1 with a ** next to them distinguish some of the transitions that occur with multiple outcomes. For example going from to tells me that one out occurred on the play, but does not tell me which player

36 is the base runner or how the out was made. This information is irrelevant for my simulation model because we are concerned with a player s ability to contribute to the runs scored for the team, and not individual statistics which would be counted if we kept track of how the transition occurred. 27

37 Output Interpretation Output Interpretation Start of Inning..Continued..Continued Batter out Start of Inning Batter out Batter reached 1st 3 Batter out ** Out made and runner on 2nd 1 End of 1st Inning ** Batter out 0 Runs scored in 1st Inning Batter reached 1st run scores Start of Inning Batter reached 3rd run scores Batter out 3 Third out made Batter reached 1st 6 End of 6th Inning 3 Batter hit into double play 2 Runs scored in 6th Inning 2 End of 2nd Inning Start of Inning 0 Runs scored in 2nd Inning Batter out Start of Inning Batter out Batter out 3 Batter out Batter reached 1st 7 End of 7th Inning ** Batter out or baserunner out 0 Runs scored in 7th Inning 3 Third out made Start of Inning 3 End of 3rd Inning Batter out 0 Runs scored in 3rd Inning Batter out Start of Inning 3 Batter out Batter out 8 End of 8th Inning Batter reached 3rd 0 Runs Scored in 8th Inning Batter reached 2nd run scores Start of Inning Batter out Batter reached 1st 3 Third out made ** Batter out or baserunner out 4 End of 4th Inning Batter reached 1st 1 Runs scored in 4th Inning Batter reached 1st run scores Start of Inning ** Batter out or baserunner out Batter reached 1st Batter reached 1st run scores ** Batter out or baserunner out Batter reaches 1st 3 Batter hit into double play Batter reaches 2nd 2 runs score 5 End of 5th Inning 3 Third out made 0 Runs scored in 5th Inning 9 End of 9th Inning Continue... Continue... 4 Runs scored in 5th Inning 7 Total Runs for the game Table 4.1: Example of a single game simulation 28

38 How the Model Will Simulate Simulating a single game is useful, yet not very reliable in terms of analysis. There is too much irregularity that can effect a single game s results. Thus, to get a more accurate analysis we want to simulate many games at one time, as this will account for some of the irregularities and negate the rare occurrences that could potentially effect a single game significantly. To simulate more than one game at a time, we can use the replicate [3] function in R. The replicate function runs the simulation of a single baseball game a specified number of times and after every simulated game is complete the runs scored in that game are stored into a runs vector. Once the specified number of simulations is complete, we take the mean of the runs vector by adding up all the entries and dividing by the number of simulations. This process gives an output of the average runs per game for the lineup used to run the simulations. For my analysis we have chosen to run a simulation three times of 100,000 games to ensure accuracy and then find the average of these three events to represent my runs per game. Thus whenever we refer to a value of runs per game, keep in mind that all values were created using the same number of simulations. We found that when running less simulations the number of runs per game fluctuates within a much larger interval than preferred, making the results less predictable and inconsistent.

39 30 As mentioned in section 3.3, my model will build a lineup where every Ax is made up of the average player s transitional matrix for that batting position x. We also saw in section 3.3 how to acquire these transitional matrices. Table 4.2 displays this lineup. We use this standardized lineup to create the surrounding circumstances that will be used to evaluate different players. To evaluate any player at position x in the line up, we must replace Ax with the transitional matrix of the player in question and run the simulations to achieve an output of the runs scored per game for that player. We can find the runs scored per game in all positions x for any player desired. For the rest of the paper when we mention runs per game of x position, it can be interpreted as replacing Ax with the player s transitional matrix while keeping the rest of the line up as is. We can now run the simulations and determine the runs per game a hypothetical line up would score with a certain player in different batting positions. Batting Position Transitional matrix A\ Average Position 1 batter A 2 Average Position 2 batter A 3 Average Position 3 batter Average Position 4 batter A5 Average Position 5 batter ^6 Average Position 6 batter A 7 Average Position 7 batter As Average Position 8 batter A9 Average Position 9 batter Table 4.2: Standardized lineup for model a 4

40 31 Chapter 5 Distribution and Ranking of Each Batter Position 5.1 Process for Finding Random Players In order to judge the significance of a player s runs per game value in a batting position we need a way to measure the results against all players. In order to achieve this, we decided to get a random sample of 30 players from the 2014 roster and analyze their results. [2] [8] However a few restrictions have to be incorporated. First the random player chosen has to have enough data to produce results in our simulation, which means at least 1000 atbats throughout the baseball seasons. This ensures a player s transitional probabilities represent his true skill and not a streak resulting from a limited amount of atbats. Second, a player has to have transitions from all 24 possible starting states in the transitional matrix. Often when a player

41 32 has a limited amount of atbats, not enough data is available to represent all starting states. Also, there are rare occurrences when a player has more than 1000 atbats but still has a missing starting state. An example of such a situation occurs for some players who are lead-off hitters in the national league. They do not have any atbats when the situation is a runner on third base with no outs. For the player to come to bat in this situation means that a pitcher or a pinch hitter has to somehow reach third base. This is quite rare, as most pitchers are poor hitters and not very fast runners. The simulation cannot work if there is a missing starting state in the transitional matrix, because if the simulation happens to be in that state when the player comes to bat our simulation has no state to transition to. Thus when picking our random players it is crucial to check that they meet these two restrictions. My method for finding random players was executed by looking at [2] and ordering the list of players by atbats taken from highest to lowest. We then ran in R a Sample function which produces a list of random numbers that we used to represent my list of players from baseball-reference.com. Lastly, we went through the list until we gathered 30 players who met the requirements stated above. 5.2 Testing Data Once we had my 30 random players, we ran the simulations and found the average number of runs per game scored by the standardized lineup for each random player

42 in the first batting position. At this point we needed a way of testing the significance of the values found for the players to judge how one compares to the other. We found the mean and standard deviation of the runs per games scored by the 30 random players once they were inserted into the standardized lineup. Figure 5.1 represents the histogram and qq normal plot of all 30 players. We noticed that it closely resembles a normal distribution. The next step was to test the distribution for normality. Using the Shapiro-wilk test in R, we reject the null hypothesis if our p value is less than.05. However, we want our data to be as close to normal as possible, this is represented with a p value closer to 1. Table 5.1 shows the results of the test. 33

43 34 Histogram Normal Q-Q Plot Frequency m iz ks 3 a jd Q. 2 ECO m runs per game Theoretical Quantiles Figure 5.1: histogram and qqnormal plot for 30 random players runs per game in lead-off position of standardized lineup.

44 35 Test for normality Null Hypothesis: data is normal Shapiro-Wilk normality test in R mean = , sd = , p-value = Reject Null Hypothesis if p <.05 Thus fail to reject null hypothesis Conclusion: can assume normally distributed data Table 5.1: Shapiro-wilk test for 30 random players in 1st batting position of standardized lineup While 30 random players serve as a valid sample size, using a larger sample of random players would be a more thorough analysis. Unfortunately the process of finding random players and running the simulations is quite time consuming. When we gather our random sample of 30 players, the sample may not represent players of various skill levels equally. This is mostly due to the lower sample size, and if the sample size was larger this issue would subside. To account for this we are imposing a restriction on the random sample in order to insure a valid result. We want the p value to be at least 0.5. Thus, once we find a random list of 30 players with a p value of 0.5 or greater, this indicates that our sample is a good representation of the diverse skill set of various players. Once we have an adequate random sample of 30 players, we can apply this list to finding the simulation results for the remaining batting positions. For my analysis we chose to examine the first five batting positions. Table 5.3 shows the runs per game scored by the standardized lineup with each random player inserted for the five batting positions examined. For example, when Goldschmidt replaces the lead-off batter in the standardized lineup, the lineup

45 36 scores an average of runs per game. Meanwhile, when he replaces the second batter of the lineup they score an average of runs per game yet when he replaces the third they score an average of runs per game, etc. If interested in examining the remaining batting positions one can extend the steps described above to achieve the desired information. Table 5.2 show the results of the normality test for the batting positions 2-5 using the same 30 random players, and Figure s 5.2, 5.3, 5.4, 5.5 represent the histogram s and qq normal plot s for these respective batting positions. The results are similar to what we found for the lead-off position, thus we can assume that all batting positions are normally distributed. Data 2nd position 3rd position 4th position 5th position Mean sd p-value data normal yes yes yes yes Table 5.2: Normality test for batting positions 2-5

46 37 Histogram Normal Q-Q Plot 10 Frequency J] runs per game m OJ sz co = 5 a m Q. E CD m Theoretical Quantiies Figure 5.2: histogram and qqnormal plot for 30 random players runs per game in second position of standardized lineup.

47 38 Histogram Normal Q-Q Plot 1 0 -i m m Frequency 6 H Q. nii runs per game CD = 5 a m aė CD m Theoretical Quantiles Figure 5.3: histogram and qqnormal plot for 30 random players runs per game in third position of standardized lineup.

48 39 Histogram Normal Q-Q Plot 8 -i Frequency m m c CO =5 O Q. E10 CD 0 I I I I runs per game Theoretical Quantiles Figure 5.4: histogram and qqnormal plot for 30 random players runs per game in fourth p osition of standardized lineup.

49 40 Histogram Normal Q-Q Plot Frequency miti Theoretical Quantiies Figure 5.5: histogram and qqnormal plot for 30 random players runs per game in fifth position of standardized lineup.

50 last name first name 1st 2nd 3rd 4th 5th Infante Omar Stubbs Drew Goldschmidt Paul Ramirez Alexi Arencibia J.P Reyes Jose Smoak Justin Ibanez Raul Texiera Mark Lawrie Brett Headly Chase Cain Lorenzo Freese David McCann Brian Suzuki Kurt Weeks Rickie Cabrera Melky Pagan Angel Sandoval Pablo Gordon Alex Marte Starling Espinosa Danny Jones Adam Seager Kyle Butler Billy Moss Brandon Jones Garrett Cano Robinson Zimerman Ryan Martinez J.D Table 5.3: Random 30 players runs per game for batting positions 1-5 of standardized lineup 41

51 Finding a Players Rank by Batting Position Now that we know the data represents a normal distribution, we can examine how a player s contribution to the standardized lineup s runs per game ranks among all the players for a specific batting position. This can be achieved by finding the percentile rank of runs per game for a player in a desired batting position. Using the function pnorm in R we get the probability of players that will have a lower runs per game value than the player in question. Simply turning this value into a percentage creates a ranking system. Table 5.4 shows the pnorm function and its components. pnorm( q, mean, sd, lower.tail= True) q runs per game of player for batting position mean mean of distribution for batting position sd standard deviation of distribution for batting position lower.tail=true finds area under the curve to the left of q value Table 5.4: Explanation of equation used for finding ranking

52 43 Chapter 6 Results of the Model Now that our process for evaluating players has been built, we need to examine the method s effectiveness. If you recall the standardized line up, which reflects the average hitter at every batting position, we used data from the baseball season to acquire the transitional matrices for these batting positions. The average runs scored in the Major Leagues[l][2][8] for those 4 years was 4.21 runs per game. Running the simulation for the standardized lineup produced an average of 4.11 runs per game. This difference of only.1 runs per game can be attributed to the removal of non batted plays. While we did expect that my runs per game was going to be lower than the MLB s average due to removing non batted plays, we were a bit surprised as to how little of an influence non batted plays have. Since non batted plays only occur about 3% of all plays in a MLB season, it is justifiable that the influence of these plays is quite low.

53 44 For the model to be effective, it should corroborate information that is well known. For example, it should be able to determine who the best offensive players are regardless of which position they bat in the line up. Table 6.1 displays how the model ranks a few of the top 50 players in the game as ranked by Fantasy Baseball over the last three seasons. [5] First name last name 1st 2nd 3rd 4th 5th Mike Trout 99.90% 99.70% 99.50% 99.20% 99.60% Miguel Cabrera 99.90% 99.90% 99.90% 99.80% 99.90% Joe Votto 99.40% 99.70% 99.40% 99.20% 98.20% Robinson Cano 97.00% 96.60% 97.20% 96.10% 97.60% Matt Holiday 97.80% 97.90% 97.30% 98.50% 98.20% Table 6.1: Rankings of top players in MLB The model should also reveal information that is not that well known or clear. For example the model should be able to distinguish between players who have similar statistics for the past three years and determine who is more valuable offensively to the team. Table 6.2 shows the rankings for two players with exactly the same BA with the same number of plate appearances over the past three years. [4] It is clear from Table 6.2 that Holiday has a better contribution to his team scoring runs in most batting positions, despite the fact that Holiday and Freeman have exactly the same batting average. Moreover, the model should be able to distinguish between players with a similar

54 45 First name last name PA BA 1st 2nd 3rd 4th 5th Matt Holiday % Fredie Freeman % 97.90% 89.20% 97.30% 95.00% 98.50% 90.30% 98.20% 93.90% Table 6.2: Rankings of players with similar PA- Plate Appearances and BA- Batting Average number of R s and RBI s. Table 6.3 focuses on players with similar R s and RBI s over the past three years and shows that these players are not equally valued by the model. [4] The table indicates that Cano is a far superior offensive contributer to his team than Pence in every single batting spot, despite having almost an identical number of R s and RBI s. First name last name R RBI 1st 2nd 3rd 4th 5th Hunter Pence % Robinson Cano % 60.40% 96.60% 66.90% 97.20% 53.30% 96.10% 97.60% 70.00% Table 6.3: Rankings of players with similar R s and RBI s

55 46 Chapter 7 Conclusion and Future Work The goal of building this model was to develop a method for evaluating players based on skill set and not on statistical measures that rely on factors inherently outside of the player s control. Although this model also aims to optimize runs per game like many previous simulations that use Markov Chain Models, it evaluates players on an equal playing field and eliminates several circumstances for which the hitter is not responsible, in particular the outcomes of the at bat of the players before and after the hitter in question. Previous Markov Chain Models have addressed issues such as optimal lineups of particular teams and pinch hitting strategy. This model offers a more insightful analysis of a player s skills and offensive contribution to his team winning. Examining a few of the results similar to those found in Tables 6.2 and 6.3 validates this goal. Although this model is designed to evaluate a player s complete offensive skill set, it

56 does not account for offensive skills such as base running and speed. Certain players are well known for their ability to steal bases. In fact, for some players stealing bases is their greatest contribution. This model undervalues players of this caliber. Furthermore, a player known for stealing bases may affect the circumstances the player after him will face when he is atbat. While the threat of a steal creates added pressure on the defense, causing mistakes to occur more frequently, it also affects the batter, as the focus of the pitcher is diverted between him and the base runner. This can often lead to a higher chance that the pitcher will throw a pitch favorable to the hitter, making it easier for him to get a hit. The factor of speed also plays a role in a player s transitional matrix. If player A comes to bat with a slow runner on base, the likelihood that the base runner will score on a double is fairly low. However if player B comes to bat with a fast runner on base, the likelihood that the base runner will score on a double is much greater. This is an example of a situation in which aspects of the game are outside of the player s control. Thus, despite the best efforts of this model to account for elements outside of the player s control, the nature of baseball makes it almost impossible to do so. For example, every team has a unique ballpark with different dimensions that factor into a team s offensive prowess. Some parks are considered more pitcher friendly, while others are more favorable to the batters. In addition, the climate and altitude in these different locations can also play a factor. In summary, there are many variables that affect a players transitional matrix. Thus, while the model does take certain factors that 47

57 48 are out of the players hand out of the analysis, there are far too many elements that are not clear as to how to model. For future work, we plan to incorporate base running, the ability to determine which type of runners are on base, and how the transition occurred into my model. One possible consideration is to expand the matrix to the same state yet find different ways to transition to that state. We would also improve on my sample size that we used to determine the distributions for each batting position. Ideally, we would like to get values for all of the players in the league. Overall, this model can be used to evaluate players for various reasons. Professional baseball teams can use this model to make decisions about which players to acquire for their team, whether through free agency, or trades and determine the optimal lineup to score more runs per game and thus gain the most value for their team.

58 49 Appendix Code Used in R To acquire a transitional matrix for a particular season like 2011 in this case we must use the following code. [11] parse.retrosheet2.pbp = function(season){ download.retrosheet <- function(season){ download.file( url=paste(" season, "eve.zip", sep="") > ), destfile=paste("download.folder", "/zipped/", season, "eve.zip", sep="") unzip.retrosheet <- function(season){ unzip(paste("download.folder", "/zipped/", season, "eve.zip", sep=""), exdir=paste("download.folder", "/unzipped", sep=""))

59 50 create.csv.file=function(year){ wd = getwdo setwd("download.f older/unzipped") if (.Platform$OS.type == "unix"){ system(paste(paste("cwevent -y", year, "-f 0-96"), paste(year,"*. EV*",sep=""), paste("> all", year, ".csv", sep="")))} else { shell(paste(paste("cwevent -y", year, "-f 0-96"), paste(year,"*.ev*",sep=""), > paste("> all", year, ".csv", sep=""))) > setwd(wd) create.csv.roster = function(year){ filenames <- list.files(path = "download.folder/unzipped/") filenames.roster = subset(filenames, substr(filenames, 4, ll)==paste(year,".ros",sep="")) read.csv2 = function(file) read.csv(paste("download.folder/unzipped/", file, sep=""),header=false) R = do.call("rbind", lapply(filenames.roster, read.csv2))

60 51 names(r)[1:6] = c("player.id", "Last.Name", "First.Name", "Bats", "Pitches", "Team") wd = getwdo setwd("download.f older/unzipped") write.csv(r, file=paste("roster", year, ".csv", sep="")) > setwd(wd) cleanup = function(){ wd = getwdo setwd("download.folder/unzipped") if (.Platform$OS.type == "unix"){ system("rm *.EVN") system("rm *.EVA") system("rm *.R0S") system ("rm TEAM*")]- else { shell("del *.EVN") shell("del *.EVA") shell("del *.R0S") > shell("del TEAM*") setwd(wd)

61 52 setwd("download.folder/zipped") if (.Platform$OS.type == "unix"){ system("rm *.zip")} else { > shell("del *.zip") > setwd(wd) download.retrosheet(season) unzip.retrosheet(season) create.csv.f ile(season) create.csv.roster(season) > cleanup() Roster2011 <- read.csv("roster2011.csv") data2011 <- read.csv("all2011.csv", header=false) fields <- read.csv("fields.csv") names(data2011) <- fields[, "Header"] data2011$half.inning <- with(data2011, paste(game_id, INN.CT, BAT_HOME_ID)) data2011$runs.scored <- with(data2011, (BAT_DEST_ID > 3) +

62 53 (RUN1_DEST_ID > 3) + (RUN2_DEST_ID > 3) + (RUN3.D get.state <- function(runner1, runner2, runner3, outs){ runners <- paste(runner1, runner2, runner3, sep="") } paste(runners, outs) RUNNER1 <- ifelse(as.character(data2011[,"basel_run_id"])=="", 0, 1) RUNNER2 <- ifelse(as.character(data2011[,"base2_run_id"])=="", 0, 1) RUNNER3 <- ifelse(as.character(data2011[,"base3_run_id"] )=="", 0, 1) data2011$state <- get.state(runner1, RUNNER2, RUNNER3, data2011$0uts_ct) NRUNNER1 <- with(data2011, as.numeric(run1_dest_id==1 I BAT_DEST_ID==1)) NRUNNER2 <- with(data2011, as.numeric(run1_dest_id==2 I RUN2_DEST_ID==2 I BAT_DEST_ID==2)) NRUNNER3 <- with(data2011, as.numeric(run1_dest_id==3 I RUN2_DEST_ID==3 I RUN3_DEST_ID==3 I BAT_DEST_ID==3)) NOUTS <- with(data2011, OUTS.CT + EVENT.OUTS_CT) data2011$new.state <- get.state(nrunner1, NRUNNER2, NRUNNER3, NOUTS) data2011 <- subset(data2011, (STATE!=NEW.STATE) I (RUNS.SC0RED>0))

63 54 library(plyr) data.outs <- ddply(data2011,.(half.inning), summarize, Outs.Inning = sum(event_outs_ct)) data2011 <- merge(data2011, data.outs) data2011c <- subset(data2011, Outs.Inning == 3) data2011c <- subset(data2011, BAT_EVENT_FL == TRUE) library(car) data2011c$new.state <- recode(data2011c$new.state, "c( 000 3, 100 3, 010 3, 001 3, 110 3, 101 3, Oil 3, ) = S ") T.matrix <- with(data2011c, table(state, NEW.STATE)) P.matrix <- prop.table(t.matrix, 1) To find individual player transition matrices use T3= with(dataseasonc, table(bat_id, STATE, NEW.STATE)) T.matrix=T3[BAT_ID,,]

64 55 To find transition matrices by batting position use T4= withcdataseasonc, table(bat_lineup_id, STATE, NEW.STATE)) T.matrix=T4[BAT_LINEUP_ID,,] Code used for simulation of a single game: endingstate=c("000 0", "000 1", "000 2", "001 0", "001 1", "001 2", "010 0", "01 C=endingstate d=l:25 inning=l:9 batter=function(k){rep(mat2[k, ].length.out=150)} mat2 is a 9 by 9 matrix of the batting lineups that take position k in the lineu ######simulation of baseball game ### Line up A01=positionlp.matrix A02=position2p.matrix A03=position3p.matrix A04=position4p.matrix A05=position5p.matrix A06=position6p.matrix A07=position7p.matrix A08=position8p.matrix

65 56 A09=position9p.matrix A10=Playerp.matrix Transitional.Matrix=list(A01, A02, A03, A04, A05, A06, A07, A08, A09, A10) simulate.game=function(a01, A02, A03, A04, A05, A06, A07, A08, A09, A10, R){ runs=0 i=l j=batter(k)[i] for(inning in 1:9) { runs.in.inning= 0 state ="000 0" while(l) { Z=Transitional.Matrix[[j ]] TP=Z[state,] C1=C[TP>0] C2=TP[TP>0] newstate=sample(cl,1, prob=c2)

66 57 if(newstate =="3") { } break > runs= (R[state, newstate] + runs) runs.in.inning= (R[state, newstate]+runs.in.inning) state = newstate i= i+1 j =batter(k)[i] i=i+l j =batter(k)[i] state ="000 0" > > runs To run a simulation for a player in different batting positions for multiple games use:

67 58 shuffle.lineup=function(k) { batter(k) RUNS = replicate(100000, simulate.game(a01, A02, A03, A04, A05, A06, A07, A08, mean(runs) } m<-numeric(10) for(k in (l:9)){m[k]=shuffle.lineup(k)}

68 59 Bibliography [1] Baseball prospectus, accessed 2014]. [2] Baseball reference, major league baseball statistics, com, [Online; accessed 2014]. [3] Companion to analyzing baseball data with r, baseball_r, [Online; accessed 2014]. [4] Fangraphs baseball statistics, [Online; accessed 2015]. [5] Fantasy baseball, [Online; accessed 2014]. [6] The official site of major league baseball, accessed 2014]. [7] Retrosheet home page, accessed 2014]. [8] Sean lahman s database, statistics/, [Online; accessed 2014]. [9] E.R. Harold B. Bukiet and J.L. Palacios, A markov chain approach to baseball, Operations Research 45 (1997), [10] Nobuyoshi Hirotsu and Mike Wright, A markov chain approach to optimal pinch hitting strategies in a designated hitter rule baseball game, Operations Research 46 (2003), [11] Max Marchi and Jim Albert, Analyzing baseball data with r, CRC Press, [12] Sheldon M. Ross, Simulation fifth edition, Academic Press, 2013.

69 [13] Tom M. Tango, Mitchel G. Lichtman, and Andrew E. Dolphin, The book: Playing the percentages in baseball, Potomac Books, Inc, [141 R.E. Trueman, Analysis of baseball as a markov process, Optimal Strategies in Sports (1977),

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Jonathan Tung University of California, Riverside tung.jonathanee@gmail.com Abstract In Major League Baseball, there

More information

Using Markov Chains to Analyze a Volleyball Rally

Using Markov Chains to Analyze a Volleyball Rally 1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using

More information

ANALYSIS OF A BASEBALL SIMULATION GAME USING MARKOV CHAINS

ANALYSIS OF A BASEBALL SIMULATION GAME USING MARKOV CHAINS ANALYSIS OF A BASEBALL SIMULATION GAME USING MARKOV CHAINS DONALD M. DAVIS 1. Introduction APBA baseball is a baseball simulation game invented by Dick Seitz of Lancaster, Pennsylvania, and first marketed

More information

February 12, Winthrop University A MARKOV CHAIN MODEL FOR RUN PRODUCTION IN BASEBALL. Thomas W. Polaski. Introduction.

February 12, Winthrop University A MARKOV CHAIN MODEL FOR RUN PRODUCTION IN BASEBALL. Thomas W. Polaski. Introduction. Winthrop University February 12, 2013 Introdcution: by the Numbers and numbers seem to go together. Statistics have been kept for over a hundred years, but lately sabermetrics has taken this obsession

More information

Opleiding Informatica

Opleiding Informatica Opleiding Informatica Determining Good Tactics for a Football Game using Raw Positional Data Davey Verhoef Supervisors: Arno Knobbe Rens Meerhoff BACHELOR THESIS Leiden Institute of Advanced Computer Science

More information

Should pitchers bat 9th?

Should pitchers bat 9th? Should pitchers bat 9th? Mark Pankin SABR 37 July 26, 2007 St. Louis, Missouri Notes provide additional information and were reminders during the presentation. They are not supposed to be anything close

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

A One-Parameter Markov Chain Model for Baseball Run Production

A One-Parameter Markov Chain Model for Baseball Run Production for Winthrop University April 13, 2013 s s and : A is an ideal candidate for mathematical modelling, as it has these features: a relatively small number of configurations, a relatively small number of

More information

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player? Which On-Base Percentage Shows the Highest True Ability of a Baseball Player? January 31, 2018 Abstract This paper looks at the true on-base ability of a baseball player given their on-base percentage.

More information

A Markov Model for Baseball with Applications

A Markov Model for Baseball with Applications University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2014 A Markov Model for Baseball with Applications Daniel Joseph Ursin University of Wisconsin-Milwaukee Follow this

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

Matt Halper 12/10/14 Stats 50. The Batting Pitcher:

Matt Halper 12/10/14 Stats 50. The Batting Pitcher: Matt Halper 12/10/14 Stats 50 The Batting Pitcher: A Statistical Analysis based on NL vs. AL Pitchers Batting Statistics in the World Series and the Implications on their Team s Success in the Series Matt

More information

Hitting with Runners in Scoring Position

Hitting with Runners in Scoring Position Hitting with Runners in Scoring Position Jim Albert Department of Mathematics and Statistics Bowling Green State University November 25, 2001 Abstract Sportscasters typically tell us about the batting

More information

2015 Winter Combined League Web Draft Rule Packet (USING YEARS )

2015 Winter Combined League Web Draft Rule Packet (USING YEARS ) 2015 Winter Combined League Web Draft Rule Packet (USING YEARS 1969-1972) Welcome to Scoresheet Baseball: the winter game. This document details the process of drafting your Old Timers Baseball team on

More information

The Rise in Infield Hits

The Rise in Infield Hits The Rise in Infield Hits Parker Phillips Harry Simon December 10, 2014 Abstract For the project, we looked at infield hits in major league baseball. Our first question was whether or not infield hits have

More information

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT.

DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT. Data Analysis & Probability Name: Date: Hour: DO YOU KNOW WHO THE BEST BASEBALL HITTER OF ALL TIMES IS?...YOUR JOB IS TO FIND OUT. This activity will find the greatest baseball hitter of all time. You

More information

arxiv: v1 [stat.ap] 18 Nov 2018

arxiv: v1 [stat.ap] 18 Nov 2018 Modeling Baseball Outcomes as Higher-Order Markov Chains Jun Hee Kim junheek1@andrew.cmu.edu Department of Statistics & Data Science, Carnegie Mellon University arxiv:1811.07259v1 [stat.ap] 18 Nov 2018

More information

Table 1. Average runs in each inning for home and road teams,

Table 1. Average runs in each inning for home and road teams, Effect of Batting Order (not Lineup) on Scoring By David W. Smith Presented July 1, 2006 at SABR36, Seattle, Washington The study I am presenting today is an outgrowth of my presentation in Cincinnati

More information

Machine Learning an American Pastime

Machine Learning an American Pastime Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the

More information

When Should Bonds be Walked Intentionally?

When Should Bonds be Walked Intentionally? When Should Bonds be Walked Intentionally? Mark Pankin SABR 33 July 10, 2003 Denver, CO Notes provide additional information and were reminders to me for making the presentation. They are not supposed

More information

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

Internet Technology Fundamentals. To use a passing score at the percentiles listed below: Internet Technology Fundamentals To use a passing score at the percentiles listed below: PASS candidates with this score or HIGHER: 2.90 High Scores Medium Scores Low Scores Percentile Rank Proficiency

More information

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008

Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008 Clutch Hitters Revisited Pete Palmer and Dick Cramer National SABR Convention June 30, 2008 Do clutch hitters exist? More precisely, are there any batters whose performance in critical game situations

More information

2018 Winter League N.L. Web Draft Packet

2018 Winter League N.L. Web Draft Packet 2018 Winter League N.L. Web Draft Packet (WEB DRAFT USING YEARS 1981-1984) Welcome to Scoresheet Baseball: the 1981-1984 Seasons. This document details the process of drafting your 2010 Old Timers Baseball

More information

2013 Tulane National Baseball Arbitration Competition

2013 Tulane National Baseball Arbitration Competition 2013 Tulane National Baseball Arbitration Competition Dexter Fowler vs. Colorado Rockies Submission on Behalf of Mr. Dexter Fowler Midpoint: $4.3 million Submission by Team 38 Table of Contents I. Introduction

More information

CS 221 PROJECT FINAL

CS 221 PROJECT FINAL CS 221 PROJECT FINAL STUART SY AND YUSHI HOMMA 1. INTRODUCTION OF TASK ESPN fantasy baseball is a common pastime for many Americans, which, coincidentally, defines a problem whose solution could potentially

More information

Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling and Michael L. Littman and Anthony R. Cassandra (1998). Planning and Acting in Partially Observable Stochastic Domains,

More information

MLB SHOWDOWN DCI Floor Rules Tournament Season Effective June 15, 2000

MLB SHOWDOWN DCI Floor Rules Tournament Season Effective June 15, 2000 MLB SHOWDOWN MLB SHOWDOWN DCI Floor Rules 1999 2000 Tournament Season Effective June 15, 2000 Introduction The MLB Showdown DCI Floor Rules work in conjunction with the DCI Universal Tournament Rules,

More information

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14

Lorenzo Cain v. Kansas City Royals. Submission on Behalf of the Kansas City Royals. Team 14 Lorenzo Cain v. Kansas City Royals Submission on Behalf of the Kansas City Royals Team 14 Table of Contents I. Introduction and Request for Hearing Decision... 1 II. Quality of the Player s Contributions

More information

Minors Division (10u) Rules

Minors Division (10u) Rules Minors Division (10u) Rules Updated: 10/1/2018 These rules are to be interpreted in harmony with the latest version of the Babe Ruth (BR) Rule Book. Where they might deviate, go with the local rules, unless

More information

Table of Contents. Pitch Counter s Role Pitching Rules Scorekeeper s Role Minimum Scorekeeping Requirements Line Ups...

Table of Contents. Pitch Counter s Role Pitching Rules Scorekeeper s Role Minimum Scorekeeping Requirements Line Ups... Fontana Community Little League Pitch Counter and Scorekeeper s Guide February, 2011 Table of Contents Pitch Counter s Role... 2 Pitching Rules... 6 Scorekeeper s Role... 7 Minimum Scorekeeping Requirements...

More information

The MLB Language. Figure 1.

The MLB Language. Figure 1. Chia-Yen Wu chiayen@avaya.com June 6, 2006 The MLB Language 1. Introduction The MLB (Major League Baseball) language is designed to help an interested party determine certain characteristics of a baseball

More information

CHAPTER 1 ORGANIZATION OF DATA SETS

CHAPTER 1 ORGANIZATION OF DATA SETS CHAPTER 1 ORGANIZATION OF DATA SETS When you collect data, it comes to you in more or less a random fashion and unorganized. For example, what if you gave a 35 item test to a class of 50 students and collect

More information

Chapter 1 The official score-sheet

Chapter 1 The official score-sheet Chapter 1 The official score-sheet - Symbols and abbreviations - The official score-sheet - Substitutions - Insufficient space on score-sheet 13 Symbols and abbreviations Symbols and abbreviations Numbers

More information

Why We Should Use the Bullpen Differently

Why We Should Use the Bullpen Differently Why We Should Use the Bullpen Differently A look into how the bullpen can be better used to save runs in Major League Baseball. Andrew Soncrant Statistics 157 Final Report University of California, Berkeley

More information

DECISION MODELING AND APPLICATIONS TO MAJOR LEAGUE BASEBALL PITCHER SUBSTITUTION

DECISION MODELING AND APPLICATIONS TO MAJOR LEAGUE BASEBALL PITCHER SUBSTITUTION DECISION MODELING AND APPLICATIONS TO MAJOR LEAGUE BASEBALL PITCHER SUBSTITUTION Natalie M. Scala, M.S., University of Pittsburgh Abstract Relief pitcher substitution is an integral part of a Major League

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

OFFICIAL RULEBOOK. Version 1.08

OFFICIAL RULEBOOK. Version 1.08 OFFICIAL RULEBOOK Version 1.08 2017 CLUTCH HOBBIES, LLC. ALL RIGHTS RESERVED. Version 1.08 3 1. Types of Cards Player Cards...4 Strategy Cards...8 Stadium Cards...9 2. Deck Building Team Roster...10 Strategy

More information

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf Online Companion to Using Simulation to Help Manage the Pace of Play in Golf MoonSoo Choi Industrial Engineering and Operations Research, Columbia University, New York, NY, USA {moonsoo.choi@columbia.edu}

More information

How to Make, Interpret and Use a Simple Plot

How to Make, Interpret and Use a Simple Plot How to Make, Interpret and Use a Simple Plot A few of the students in ASTR 101 have limited mathematics or science backgrounds, with the result that they are sometimes not sure about how to make plots

More information

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND TOPIC 0: BASIC PROBABILITY AND THE HOT HAND The Hot Hand Debate Let s start with a basic question, much debated in sports circles: Does the Hot Hand really exist? A number of studies on this topic can

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,

More information

A Markov Model of Baseball: Applications to Two Sluggers

A Markov Model of Baseball: Applications to Two Sluggers A Markov Model of Baseball: Applications to Two Sluggers Mark Pankin INFORMS November 5, 2006 Pittsburgh, PA Notes are not intended to be a complete discussion or the text of my presentation. The notes

More information

THE VILLAGES REC DIVISION IV PROCEDURES WINTER 2019 Revised January 24, 2019

THE VILLAGES REC DIVISION IV PROCEDURES WINTER 2019 Revised January 24, 2019 CONTENTS ARTICLE ONE: THE BOARD OF DIRECTORS (Pages 2-5) SECTION 1- Number of Members (2) SECTION 2- Qualifications (2) SECTION 3- Terms of Office (2-3) SECTION 4- Nominations (3) SECTION 5- Elections

More information

Average Runs per inning,

Average Runs per inning, Home Team Scoring Advantage in the First Inning Largely Due to Time By David W. Smith Presented June 26, 2015 SABR45, Chicago, Illinois Throughout baseball history, the home team has scored significantly

More information

THE VILLAGES REC DIVISION IV PROCEDURES WINTER 2018 Revised

THE VILLAGES REC DIVISION IV PROCEDURES WINTER 2018 Revised CONTENTS ARTICLE ONE: THE BOARD OF DIRECTORS (Pages 2-5) SECTION 1- Number of Members (2) SECTION 2- Qualifications (2) SECTION 3- Terms of Office (2-3) SECTION 4- Nominations (3) SECTION 5- Elections

More information

A Database Design for Selecting a Golden Glove Winner using Sabermetrics

A Database Design for Selecting a Golden Glove Winner using Sabermetrics , pp.38-42 http://dx.doi.org/10.14257/astl.2015.110.08 A Database Design for Selecting a Golden Glove Winner using Sabermetrics Wu-In Jang and Young-Ho Park *, Department of Multimedia Science, Sookmyung

More information

Wheaton Youth Baseball Pony League - Supplementary Rules

Wheaton Youth Baseball Pony League - Supplementary Rules Wheaton Youth Baseball Pony League - Supplementary Rules Revised & Approved: November 2018 Pony league play shall be governed by PONY League Baseball playing rules unless otherwise stated in these supplementary

More information

5.1 Introduction. Learning Objectives

5.1 Introduction. Learning Objectives Learning Objectives 5.1 Introduction Statistical Process Control (SPC): SPC is a powerful collection of problem-solving tools useful in achieving process stability and improving capability through the

More information

One of the most-celebrated feats

One of the most-celebrated feats Joe DiMaggio Done It Again and Again and Again and Again? David Rockoff and Philip Yates Joe DiMaggio done it again! Joe DiMaggio done it again! Clackin that bat, gone with the wind! Joe DiMaggio s done

More information

PGA Tour Scores as a Gaussian Random Variable

PGA Tour Scores as a Gaussian Random Variable PGA Tour Scores as a Gaussian Random Variable Robert D. Grober Departments of Applied Physics and Physics Yale University, New Haven, CT 06520 Abstract In this paper it is demonstrated that the scoring

More information

OFFICIAL RULEBOOK. Version 1.16

OFFICIAL RULEBOOK. Version 1.16 OFFICIAL RULEBOOK Version.6 3. Types of Cards Player Cards...4 Strategy Cards...8 Stadium Cards...9 2. Deck Building Team Roster...0 Strategy Deck...0 Stadium Selection... 207 CLUTCH BASEBALL ALL RIGHTS

More information

Antelope Little League

Antelope Little League Antelope Little League Scorekeeper Training Thank you for volunteering to be a scorekeeper! It s an essential role, not only for keeping track of the score but also for the safety of the players. Being

More information

WHEATON YOUTH BASEBALL BRONCO LEAGUE SUPPLEMENTARY RULES

WHEATON YOUTH BASEBALL BRONCO LEAGUE SUPPLEMENTARY RULES WHEATON YOUTH BASEBALL BRONCO LEAGUE SUPPLEMENTARY RULES Revised & Approved: November 2018 League play will be governed by PONY League Baseball playing rules unless otherwise stated in these supplementary

More information

At each type of conflict location, the risk is affected by certain parameters:

At each type of conflict location, the risk is affected by certain parameters: TN001 April 2016 The separated cycleway options tool (SCOT) was developed to partially address some of the gaps identified in Stage 1 of the Cycling Network Guidance project relating to separated cycleways.

More information

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precept Materials for Week 3 February 16, 17

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precept Materials for Week 3 February 16, 17 ECO 199 GAMES OF STRATEGY Spring Term 2004 Precept Materials for Week 3 February 16, 17 Illustration of Rollback in a Decision Problem, and Dynamic Games of Competition Here we discuss an example whose

More information

A PRIMER ON BAYESIAN STATISTICS BY T. S. MEANS

A PRIMER ON BAYESIAN STATISTICS BY T. S. MEANS A PRIMER ON BAYESIAN STATISTICS BY T. S. MEANS 1987, 1990, 1993, 1999, 2011 A PRIMER ON BAYESIAN STATISTICS BY T. S. MEANS DEPARTMENT OF ECONOMICS SAN JOSE STATE UNIVERSITY SAN JOSE, CA 95192-0114 This

More information

One could argue that the United States is sports driven. Many cities are passionate and

One could argue that the United States is sports driven. Many cities are passionate and Hoque 1 LITERATURE REVIEW ADITYA HOQUE INTRODUCTION One could argue that the United States is sports driven. Many cities are passionate and centered around their sports teams. Sports are also financially

More information

How Effective is Change of Pace Bowling in Cricket?

How Effective is Change of Pace Bowling in Cricket? How Effective is Change of Pace Bowling in Cricket? SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

More information

2015 GTAAA Jr. Bulldogs Memorial Day Tournament

2015 GTAAA Jr. Bulldogs Memorial Day Tournament 2015 GTAAA Jr. Bulldogs Memorial Day Tournament 9U and 10U Rules General Rules 1. Players must be a full-time member of their respective in-house baseball organization with the team roster comprised of

More information

Wheaton Youth Baseball Pony League - Supplementary Rules

Wheaton Youth Baseball Pony League - Supplementary Rules Wheaton Youth Baseball Pony League - Supplementary Rules Revised & Approved: January 2018 Pony league play shall be governed by PONY League Baseball playing rules unless otherwise stated in these supplementary

More information

INFORMS Transactions on Education

INFORMS Transactions on Education This article was downloaded by: [46.3.197.130] On: 10 February 2018, At: 06:16 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA INFORMS

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

NPYL Major Division Rules Boys 11 & 12 Years of Age Revised April 2015

NPYL Major Division Rules Boys 11 & 12 Years of Age Revised April 2015 NPYL Major Division Rules Boys 11 & 12 Years of Age Revised April 2015 The governing rules of play will be officially recognized Cal Ripken baseball rules with the following exceptions and/or local rules

More information

2015 NATIONAL BASEBALL ARBITRATION COMPETITION

2015 NATIONAL BASEBALL ARBITRATION COMPETITION 2015 NATIONAL BASEBALL ARBITRATION COMPETITION Arizona Diamondbacks v. Mark Trumbo Submission on Behalf of Arizona Diamondbacks Midpoint: $5,900,000 Submission by Team: 5 Table of Contents I. Introduction

More information

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Analysis of Variance. Copyright 2014 Pearson Education, Inc. Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually

More information

Lakeshore Baseball and Softball Association

Lakeshore Baseball and Softball Association Lakeshore Baseball and Softball Association A. Rules for Minors Baseball (Revised April 2015) Except where specifically amended in this document, the rules of the National Federation of State High School

More information

Energy capture performance

Energy capture performance Energy capture performance Cost of energy is a critical factor to the success of marine renewables, in order for marine renewables to compete with other forms of renewable and fossil-fuelled power generation.

More information

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings.

An average pitcher's PG = 50. Higher numbers are worse, and lower are better. Great seasons will have negative PG ratings. Fastball 1-2-3! This simple game gives quick results on the outcome of a baseball game in under 5 minutes. You roll 3 ten-sided dice (10d) of different colors. If the die has a 10 on it, count it as 0.

More information

Calvary A.A. Baseball

Calvary A.A. Baseball Calvary A.A. Baseball Tournament Rules Inclement Weather Policy In the event of rain we will do everything within our power to make up games and stay as close to the original game schedule as possible.

More information

Stafford Little League Softball Bi-laws & Local Rules 2016 Season

Stafford Little League Softball Bi-laws & Local Rules 2016 Season Stafford Little League Softball Bi-laws & Local Rules 2016 Season Page 1 of 11 Table of Contents 1. INTRODUCTION... 3 2. DRAFT RULES & ASSESSMENT... 3 3. DIVISION RULES... 5 4. CHAMPIONSHIP BY DIVISION

More information

The next criteria will apply to partial tournaments. Consider the following example:

The next criteria will apply to partial tournaments. Consider the following example: Criteria for Assessing a Ranking Method Final Report: Undergraduate Research Assistantship Summer 2003 Gordon Davis: dagojr@email.arizona.edu Advisor: Dr. Russel Carlson One of the many questions that

More information

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB)

2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB) 2014 Tulane Baseball Arbitration Competition Josh Reddick v. Oakland Athletics (MLB) Submission on Behalf of the Oakland Athletics Team 15 Table of Contents I. INTRODUCTION AND REQUEST FOR HEARING DECISION...

More information

Fairfax Little League PPR Input Guide

Fairfax Little League PPR Input Guide Fairfax Little League PPR Input Guide Each level has different participation requirements. Please refer to the League Bylaws section 7 for specific details. Player Participation Records (PPR) will be reported

More information

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players

Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players Jonathan White Paper Title: An Analysis of the Relationship between Pressure and Performance in Major League Baseball Players If you were to scrutinize Alex Rodriguez s statistics during the 2006 season,

More information

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck? A Batting Average: Does It Represent Ability or Luck? Jim Albert Department of Mathematics and Statistics Bowling Green State University albert@bgnet.bgsu.edu ABSTRACT Recently Bickel and Stotz (2003)

More information

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem Background Information Every year, the National Baseball Hall of Fame conducts an election to select new inductees from candidates nationally recognized for their talent or association with the sport of

More information

BERKSHIRE II: AN EXPERIENTIAL DECISION MAKING EXERCISE. Tom F. Badgett, Texas Christian University Halsey R. Jones, Texas Christian University

BERKSHIRE II: AN EXPERIENTIAL DECISION MAKING EXERCISE. Tom F. Badgett, Texas Christian University Halsey R. Jones, Texas Christian University BERKSHIRE II: AN EXPERIENTIAL DECISION MAKING EXERCISE Tom F. Badgett, Texas Christian University Halsey R. Jones, Texas Christian University This paper describes a competitive experiential exercise which

More information

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher Mini-project 3 Tennis ball launcher Mini-Project 3 requires you to use MATLAB to model the trajectory of a tennis ball being shot from a tennis ball launcher to a player. The tennis ball trajectory model

More information

Fairfield National Little League AA Rules (updated: Spring 2014)

Fairfield National Little League AA Rules (updated: Spring 2014) With a few exceptions as noted below, we will be following Little League Baseball Rules. Unless noted below, standard Little League Baseball Rules govern (e.g. Green Book). General Rules: Uniforms: Coaches:

More information

TOP OF THE TENTH Instructions

TOP OF THE TENTH Instructions Instructions is based on the original Extra Innings which was developed by Jack Kavanaugh with enhancements from various gamers, as well as many ideas I ve had bouncing around in my head since I started

More information

Figure 1. Winning percentage when leading by indicated margin after each inning,

Figure 1. Winning percentage when leading by indicated margin after each inning, The 7 th Inning Is The Key By David W. Smith Presented June, 7 SABR47, New York, New York It is now nearly universal for teams with a 9 th inning lead of three runs or fewer (the definition of a save situation

More information

2014 National Baseball Arbitration Competition

2014 National Baseball Arbitration Competition 2014 National Baseball Arbitration Competition Jeff Samardzija v. Chicago Cubs Submission on Behalf of Chicago Cubs Midpoint: $4.9 million Submission by: Team 26 Table of Contents I. Introduction and Request

More information

ISCORE INTEGRATION IOS SCORING GUIDE

ISCORE INTEGRATION IOS SCORING GUIDE ISCORE INTEGRATION IOS SCORING GUIDE TABLE OF CONTENTS TABLE OF CONTENTS... 2 INTRODUCTION... 4 INTEGRATION REQUIRMENTS... 4 GETTING STARTED... 4 Discover Games... 4 GAME INFO... 5 Game Info Options...

More information

The pth percentile of a distribution is the value with p percent of the observations less than it.

The pth percentile of a distribution is the value with p percent of the observations less than it. Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition:

More information

Ranking teams in partially-disjoint tournaments

Ranking teams in partially-disjoint tournaments Ranking teams in partially-disjoint tournaments Alex Choy Mentor: Chris Jones September 16, 2013 1 Introduction Throughout sports, whether it is professional or collegiate sports, teams are ranked. In

More information

Examples of Carter Corrected DBDB-V Applied to Acoustic Propagation Modeling

Examples of Carter Corrected DBDB-V Applied to Acoustic Propagation Modeling Naval Research Laboratory Stennis Space Center, MS 39529-5004 NRL/MR/7182--08-9100 Examples of Carter Corrected DBDB-V Applied to Acoustic Propagation Modeling J. Paquin Fabre Acoustic Simulation, Measurements,

More information

PHYS Tutorial 7: Random Walks & Monte Carlo Integration

PHYS Tutorial 7: Random Walks & Monte Carlo Integration PHYS 410 - Tutorial 7: Random Walks & Monte Carlo Integration The goal of this tutorial is to model a random walk in two dimensions and observe a phase transition as parameters are varied. Additionally,

More information

Softball

Softball The directors at Elite Sports would like to welcome you to our association. We will strive to produce the highest quality tournaments possible. Unlike many associations, we like to hear your comments and

More information

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS 2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS Player Demand: $4.00 Million Club Offer: $3.30 Million Midpoint:

More information

Scorekeeping Guide Book

Scorekeeping Guide Book Scorekeeping Guide Book Courtesy of East Orange Babe Ruth Table of Contents Page 1. Starting the Scorecard for a Game...1 2. The Scorecard Layout...2 Individual and Game Totals...2 3. Scorekeeping Basics...3

More information

City of Palo Alto ADULT SOFTBALL RULES

City of Palo Alto ADULT SOFTBALL RULES City of Palo Alto ADULT SOFTBALL RULES I. TEAM ROSTER / PLAYER ELIGIBILITY A. Each team must have a minimum of 12 and a maximum of 25 players, which must include the manager, provided he/she is a playing

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

Chapter 2 - Displaying and Describing Categorical Data

Chapter 2 - Displaying and Describing Categorical Data Chapter 2 - Displaying and Describing Categorical Data August 28, 2014 Exploratory Data Analysis - The use of graphs or numerical summaries (values) to describe the variables in a data set and the relation

More information

Chapter 3 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data Chapter 3 - Displaying and Describing Categorical Data August 25, 2010 Exploratory Data Analysis - The use of graphs or numerical summaries (values) to describe the variables in a data set and the relation

More information

Extreme Shooters in the NBA

Extreme Shooters in the NBA Extreme Shooters in the NBA Manav Kant and Lisa R. Goldberg 1. Introduction December 24, 2018 It is widely perceived that star shooters in basketball have hot and cold streaks. This perception was first

More information

SAP Predictive Analysis and the MLB Post Season

SAP Predictive Analysis and the MLB Post Season SAP Predictive Analysis and the MLB Post Season Since September is drawing to a close and October is rapidly approaching, I decided to hunt down some baseball data and see if we can draw any insights on

More information

Lab Report Outline the Bones of the Story

Lab Report Outline the Bones of the Story Lab Report Outline the Bones of the Story In this course, you are asked to write only the outline of a lab report. A good lab report provides a complete record of your experiment, and even in outline form

More information

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base

More information