Measuring Batting Performance Authors: Samantha Attar, Hannah Dineen, Andy Fullerton, Nora Hanson, Cam Kelso, Katie McLaughlin, and Caitlyn Nolan Introduction: The following analysis compares slugging percentage, on-base percentage, and batting average to runs scored from the Steroid Era and the post-steroid Era. Due to the effects of steroids, our group hypothesized that all three batting performance statistics would result in a greater number of runs scored during the Steroid Era compared to the post-steroid Era. Methods: We relied on the Lahman data frame to make the comparisons between slugging percentage, on-base percentage, and batting average and runs scored. To determine if there was a difference between the Steroid Era and the post-steroid Era, we investigated years 1980-2005 and 2006-2014. We used dplyr to manipulate the data and we used ggplot to make figures to compare the data. We then tested the significance of the null hypothesis to see if there were relationships between the batting performance statistics and runs scored using covariance. We then performed a linear regression to compare the slopes of the lines. Findings: The slugging percentage covariance indicated that there is no significance in the relationship between eras. The p-value was 0.901. In contrast, the on-base percentage covariance and batting average covariance indicated that there is significance in the relationship between era. The p-value for on-base percentage was 0.0162 and the p-value for batting average was 0.000772, which are both less than 0.05. There is significance between the eras for these two batting performance statistics. Therefore, we can reject the null hypothesis - that the two slopes for the two eras are the same - for on-base percentage and batting average, but not slugging percentage. During the Steroid Era, the slope of onbase percentage vs. runs was 5689.62 while the slope was 5043.8 during the post-steroid Era. During the Steroid Era, the slope of batting average vs. runs was 6555.22 while the slope was 5123.17 during the post-steroid Era. Discussion/overview/implications The steeper slopes of the regression lines for on-base percentage and batting average in the Steroid Era indicate that players taking steroids yielded more runs than those not taking
steroids. Players taking steroids were likely stronger than those not taking steroids in the post-steroid Era, so they may have been getting more hits due to the ability to hit the ball farther and harder. As a result of greater batting average, on-base percentages would also increase. Perhaps we cannot reject the null hypothesis for slugging percentage because players who would normally hit a high number of homeruns simply would hit them farther with the help of steroids. head(teams) yearid lgid teamid franchid divid Rank G Ghome W L DivWin WCWin LgWin 1 1871 NA BS1 BNA <NA> 3 31 NA 20 10 <NA> <NA> N 2 1871 NA CH1 CNA <NA> 2 28 NA 19 9 <NA> <NA> N 3 1871 NA CL1 CFC <NA> 8 29 NA 10 19 <NA> <NA> N 4 1871 NA FW1 KEK <NA> 7 19 NA 7 12 <NA> <NA> N 5 1871 NA NY2 NNA <NA> 5 33 NA 16 17 <NA> <NA> N 6 1871 NA PH1 PNA <NA> 1 28 NA 21 7 <NA> <NA> Y WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV 1 <NA> 401 1372 426 70 37 3 60 19 73 NA NA NA 303 109 3.55 22 1 3 2 <NA> 302 1196 323 52 21 10 60 22 69 NA NA NA 241 77 2.76 25 0 1 3 <NA> 249 1186 328 35 40 7 26 25 18 NA NA NA 341 116 4.11 23 0 0 4 <NA> 137 746 178 19 8 2 33 9 16 NA NA NA 243 97 5.17 19 1 0 5 <NA> 302 1404 403 43 21 1 33 15 46 NA NA NA 313 121 3.72 32 1 0 6 <NA> 376 1281 410 66 27 9 46 23 56 NA NA NA 266 137 4.95 27 0 0 IPouts HA HRA BBA SOA E DP FP name 1 828 367 2 42 23 225 NA 0.83 Boston Red Stockings 2 753 308 6 28 22 218 NA 0.82 Chicago White Stockings 3 762 346 13 53 34 223 NA 0.81 Cleveland Forest Citys 4 507 261 5 21 17 163 NA 0.80 Fort Wayne Kekiongas 5 879 373 7 42 22 227 NA 0.83 New York Mutuals 6 747 329 3 53 16 194 NA 0.84 Philadelphia Athletics park attendance BPF PPF teamidbr teamidlahman45 1 South End Grounds I NA 103 98 BOS BS1 2 Union Base-Ball Grounds NA 104 102 CHI CH1 3 National Association Grounds NA 96 100 CLE CL1 4 Hamilton Field NA 101 107 KEK FW1 5 Union Grounds (Brooklyn) NA 90 88 NYU NY2 6 Jefferson Street Grounds NA 102 98 ATH PH1 teamidretro 1 BS1 2 CH1 3 CL1 4 FW1 5 NY2 6 PH1 tm.batting <- Teams %>% select(-(rank:wswin),-(ra:teamidretro)) %>% filter(yearid>1980,yearid<2014) %>% filter(!is.na(hbp),!is.na(sf),!is.na(cs)) %>% group_by(yearid,teamid) %>% summarize(ba=round(h/ab,3),
head(tm.batting) PA=AB+BB+HBP+SF, OBP=(H+BB+HBP)/PA, X1B = H-X2B-X3B-HR, TB= X1B+2*X2B+3*X3B+4*HR/AB, SLG= round(tb/ab,3), OPS=OBP+SLG, ISO=SLG-BA, TAv=(TB+HBP+BB+SB)-CS/(AB-H)+CS, RC=(H+BB-CS)*(TB+0.55*SB)/(AB+BB), BRA=OBP*SLG, SoR=SO/PA, WR=BB/PA, R) Source: local data frame [6 x 16] Groups: yearid [1] yearid teamid BA PA OBP X1B TB SLG OPS ISO (int) (fctr) (dbl) (int) (dbl) (int) (dbl) (dbl) (dbl) (dbl) 1 2000 ANA 0.280 6326 0.3523554 995 1715.168 0.305 0.6573554 0.025 2 2000 ARI 0.265 6179 0.3333873 961 1657.130 0.300 0.6333873 0.035 3 2000 ATL 0.271 6188 0.3464771 1011 1637.130 0.298 0.6444771 0.027 4 2000 BAL 0.272 6210 0.3405797 992 1678.133 0.302 0.6425797 0.030 5 2000 BOS 0.267 6331 0.3405465 988 1716.119 0.305 0.6455465 0.038 6 2000 CHA 0.286 6351 0.3556920 1041 1790.153 0.317 0.6726920 0.031 Variables not shown: TAv (dbl), RC (dbl), BRA (dbl), SoR (dbl), WR (dbl), R (int) tm.batting$era <- ifelse(tm.batting$yearid<2006,"steroid","post") ggplot(tm.batting,aes(slg,r))+geom_point()+stat_smooth(method="lm") + facet_g rid(era~.)
ggplot(tm.batting,aes(obp,r))+geom_point()+stat_smooth(method="lm") + facet_g rid(era~.)
ggplot(tm.batting,aes(ba,r))+geom_point()+stat_smooth(method="lm") + facet_gr id(era~.)
glimpse(tm.batting) Observations: 420 Variables: 17 $ yearid (int) 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2... $ teamid (fctr) ANA, ARI, ATL, BAL, BOS, CHA, CHN, CIN, CLE, COL, DET,... $ BA (dbl) 0.280, 0.265, 0.271, 0.272, 0.267, 0.286, 0.256, 0.274,... $ PA (int) 6326, 6179, 6188, 6210, 6331, 6351, 6308, 6316, 6471, 6... $ OBP (dbl) 0.3523554, 0.3333873, 0.3464771, 0.3405797, 0.3405465,... $ X1B (int) 995, 961, 1011, 992, 988, 1041, 948, 1007, 1078, 1130,... $ TB (dbl) 1715.168, 1657.130, 1637.130, 1678.133, 1716.119, 1790... $ SLG (dbl) 0.305, 0.300, 0.298, 0.302, 0.305, 0.317, 0.280, 0.305,... $ OPS (dbl) 0.6573554, 0.6333873, 0.6444771, 0.6425797, 0.6455465,... $ ISO (dbl) 0.025, 0.035, 0.027, 0.030, 0.038, 0.031, 0.024, 0.031,... $ TAv (dbl) 2515.155, 2392.119, 2495.116, 2476.117, 2442.111, 2595... $ RC (dbl) 603.3125, 552.1954, 573.1259, 572.5582, 580.9450, 643.8... $ BRA (dbl) 0.10746838, 0.10001618, 0.10325016, 0.10285507, 0.10386... $ SoR (dbl) 0.1618716, 0.1577925, 0.1632191, 0.1449275, 0.1609540,... $ WR (dbl) 0.09611129, 0.08658359, 0.09615385, 0.08985507, 0.09650... $ R (int) 864, 792, 810, 794, 792, 978, 764, 825, 950, 968, 823,... $ era (chr) "Steroid", "Steroid", "Steroid", "Steroid", "Steroid",...
model.slg <- lm(tm.batting$r~tm.batting$slg+tm.batting$era+tm.batting$slg*tm. batting$era) summary(model.slg) Call: lm(formula = tm.batting$r ~ tm.batting$slg + tm.batting$era + tm.batting$slg * tm.batting$era) Residuals: Min 1Q Median 3Q Max -171.066-49.620-0.721 45.884 206.946 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -136.43 80.79-1.689 0.092 tm.batting$slg 2947.61 274.30 10.746 <2e-16 tm.batting$erasteroid 16.90 138.13 0.122 0.903 tm.batting$slg:tm.batting$erasteroid 57.96 465.92 0.124 0.901 (Intercept). tm.batting$slg *** tm.batting$erasteroid tm.batting$slg:tm.batting$erasteroid --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 67.98 on 416 degrees of freedom Multiple R-squared: 0.3466, Adjusted R-squared: 0.3419 F-statistic: 73.55 on 3 and 416 DF, p-value: < 2.2e-16 model.obp <- lm(tm.batting$r~tm.batting$obp+tm.batting$era+tm.batting$obp*tm. batting$era) summary(model.obp) Call: lm(formula = tm.batting$r ~ tm.batting$obp + tm.batting$era + tm.batting$obp * tm.batting$era) Residuals: Min 1Q Median 3Q Max -97.983-23.481-0.126 22.854 119.841 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -921.16 55.32-16.652 <2e-16 tm.batting$obp 5043.82 168.77 29.885 <2e-16 tm.batting$erasteroid -206.71 88.74-2.329 0.0203 tm.batting$obp:tm.batting$erasteroid 645.79 267.45 2.415 0.0162
(Intercept) *** tm.batting$obp *** tm.batting$erasteroid * tm.batting$obp:tm.batting$erasteroid * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 36.53 on 416 degrees of freedom Multiple R-squared: 0.8113, Adjusted R-squared: 0.81 F-statistic: 596.3 on 3 and 416 DF, p-value: < 2.2e-16 model.ba <- lm(tm.batting$r~tm.batting$ba+tm.batting$era+tm.batting$ba*tm.bat ting$era) summary(model.ba) Call: lm(formula = tm.batting$r ~ tm.batting$ba + tm.batting$era + tm.batting$ba * tm.batting$era) Residuals: Min 1Q Median 3Q Max -124.167-33.029-1.377 32.181 140.537 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -603.58 65.34-9.237 < 2e-16 tm.batting$ba 5123.17 250.65 20.439 < 2e-16 tm.batting$erasteroid -359.87 111.44-3.229 0.001340 tm.batting$ba:tm.batting$erasteroid 1432.05 422.73 3.388 0.000772 (Intercept) *** tm.batting$ba *** tm.batting$erasteroid ** tm.batting$ba:tm.batting$erasteroid *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 47.78 on 416 degrees of freedom Multiple R-squared: 0.6772, Adjusted R-squared: 0.6748 F-statistic: 290.9 on 3 and 416 DF, p-value: < 2.2e-16 steroid <- tm.batting %>% filter(era == "Steroid") post <- tm.batting %>% filter(era == "Post") slg.steroid <- lm(steroid$r~steroid$slg) summary(slg.steroid) Call:
lm(formula = steroid$r ~ steroid$slg) Residuals: Min 1Q Median 3Q Max -171.07-56.21 0.34 47.27 206.95 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -119.5 120.7-0.990 0.323 steroid$slg 3005.6 405.8 7.407 4.98e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 73.24 on 178 degrees of freedom Multiple R-squared: 0.2356, Adjusted R-squared: 0.2313 F-statistic: 54.87 on 1 and 178 DF, p-value: 4.981e-12 slg.post <- lm(post$r~post$slg) summary(slg.post) Call: lm(formula = post$r ~ post$slg) Residuals: Min 1Q Median 3Q Max -150.540-47.371-0.721 42.721 169.260 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -136.43 75.78-1.80 0.0731. post$slg 2947.61 257.28 11.46 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 63.76 on 238 degrees of freedom Multiple R-squared: 0.3555, Adjusted R-squared: 0.3528 F-statistic: 131.3 on 1 and 238 DF, p-value: < 2.2e-16 obp.steroid <- lm(steroid$r~steroid$obp) summary(obp.steroid) Call: lm(formula = steroid$r ~ steroid$obp) Residuals: Min 1Q Median 3Q Max -91.420-21.660 0.879 22.756 119.841 Coefficients:
Estimate Std. Error t value Pr(> t ) (Intercept) -1127.87 70.57-15.98 <2e-16 *** steroid$obp 5689.62 210.99 26.97 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 37.15 on 178 degrees of freedom Multiple R-squared: 0.8034, Adjusted R-squared: 0.8022 F-statistic: 727.2 on 1 and 178 DF, p-value: < 2.2e-16 obp.post <- lm(post$r~post$obp) summary(obp.post) Call: lm(formula = post$r ~ post$obp) Residuals: Min 1Q Median 3Q Max -97.983-24.008-0.887 23.020 101.782 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -921.2 54.6-16.87 <2e-16 *** post$obp 5043.8 166.6 30.27 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 36.06 on 238 degrees of freedom Multiple R-squared: 0.7939, Adjusted R-squared: 0.793 F-statistic: 916.6 on 1 and 238 DF, p-value: < 2.2e-16 ba.steroid <- lm(steroid$r~steroid$ba) summary(ba.steroid) Call: lm(formula = steroid$r ~ steroid$ba) Residuals: Min 1Q Median 3Q Max -108.463-32.939-3.851 30.619 140.537 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -963.45 89.83-10.72 <2e-16 *** steroid$ba 6555.22 338.74 19.35 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 47.55 on 178 degrees of freedom
Multiple R-squared: 0.6778, Adjusted R-squared: 0.676 F-statistic: 374.5 on 1 and 178 DF, p-value: < 2.2e-16 ba.post <- lm(post$r~post$ba) summary(ba.post) Call: lm(formula = post$r ~ post$ba) Residuals: Min 1Q Median 3Q Max -124.167-33.075 2.173 33.579 140.158 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -603.58 65.58-9.204 <2e-16 *** post$ba 5123.17 251.56 20.365 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 47.96 on 238 degrees of freedom Multiple R-squared: 0.6354, Adjusted R-squared: 0.6339 F-statistic: 414.7 on 1 and 238 DF, p-value: < 2.2e-16