Model Selection Erwan Le Pennec Fall 2015

Size: px
Start display at page:

Download "Model Selection Erwan Le Pennec Fall 2015"

Transcription

1 Model Selection Erwan Le Pennec Fall 2015 library("dplyr") library("ggplot2") library("ggfortify") library("reshape2") Model Selection We will now use another classical dataset birthwt which corresponds to a study on risk factors associated with low infant birth weight conducted at Baystate Medical Center, Springfield, Mass during It consists of 189 observations of 10 variables. Variable low age lwt race smoke ptl ht ui ftv bwt Content indicator of birth weight less than 2.5 kg. mother s age in years. mother s weight in pounds at last menstrual period. mother s race (1 = white, 2 = black, 3 = other). smoking status during pregnancy. number of previous premature labors. history of hypertension. presence of uterine irritability. number of physician visits during the first trimester. birth weight in grams. Our goal will be to predict bwt, the birth weight, from all the other variables (except low!). 1. Load the dataset from the package MASS and inspect it with glimpse. lbw <- MASS::birthwt glimpse(lbw) Observations: 189 Variables: 10 $ low (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ age (int) 19, 33, 20, 21, 18, 21, 22, 17, 29, 26, 19, 19, 22, 30,... $ lwt (int) 182, 155, 105, 108, 107, 124, 118, 103, 123, 113, 95, $ race (int) 2, 3, 1, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3, 3, 1, 1, 2, 1, 3,... $ smoke (int) 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,... $ ptl (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,... $ ht (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,... $ ui (int) 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,... $ ftv (int) 0, 3, 1, 2, 0, 0, 1, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 3, 0,... $ bwt (int) 2523, 2551, 2557, 2594, 2600, 2622, 2637, 2637, 2663,

2 2. Fix the different factor issues. lbw <- mutate(lbw, low = factor(low, levels = c(0,1), labels = c("normal", "low"))) lbw <- mutate(lbw, race = factor(race, levels = c(1,2,3), labels = c("white", "black", "other"))) lbw <- mutate(lbw, smoke = factor(smoke, levels = c(0,1), labels = c("no","yes"))) lbw <- mutate(lbw, ht = factor(ht, levels = c(0,1), labels = c("no","yes"))) lbw <- mutate(lbw, ui = factor(ui, levels = c(0,1), labels = c("no","yes"))) lbw <- select(lbw, -low) glimpse(lbw) Observations: 189 Variables: 9 $ age (int) 19, 33, 20, 21, 18, 21, 22, 17, 29, 26, 19, 19, 22, 30,... $ lwt (int) 182, 155, 105, 108, 107, 124, 118, 103, 123, 113, 95, $ race (fctr) black, other, white, white, white, other, white, other,... $ smoke (fctr) no, no, yes, yes, yes, no, no, no, yes, yes, no, no, no... $ ptl (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,... $ ht (fctr) no, no, no, no, no, no, no, no, no, no, no, no, yes, no... $ ui (fctr) yes, no, no, yes, yes, no, no, no, no, no, no, no, no,... $ ftv (int) 0, 3, 1, 2, 0, 0, 1, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 3, 0,... $ bwt (int) 2523, 2551, 2557, 2594, 2600, 2622, 2637, 2637, 2663, Verify that the dataset does not contain any missing values. summary(lbw) age lwt race smoke ptl Min. :14.00 Min. : 80.0 white:96 no :115 Min. : st Qu.: st Qu.:110.0 black:26 yes: 74 1st Qu.: Median :23.00 Median :121.0 other:67 Median : Mean :23.24 Mean :129.8 Mean : rd Qu.: rd Qu.: rd Qu.: Max. :45.00 Max. :250.0 Max. : ht ui ftv bwt no :177 no :161 Min. : Min. : 709 yes: 12 yes: 28 1st Qu.: st Qu.:2414 Median : Median :2977 Mean : Mean :2945 3rd Qu.: rd Qu.:3487 Max. : Max. : Inspect visually all the variables independently. for (name in names(lbw)) { print(qplot(data = lbw, get(name), xlab = name)) } 2

3 15 count age count lwt 3

4 count white black other race 90 count no smoke yes 4

5 count ptl 150 count no ht yes 5

6 count no ui yes 75 count ftv 6

7 15 10 count bwt 5. Inspect visually the relation between every variable and bwt. Can you infer the most useful variables? for (name in names(lbw)[-9]) { if (class(lbw[[name]])=="factor") { print(ggplot(data = lbw, aes_string(x = name, y = "bwt")) + geom_boxplot() + geom_point(position = position_jitter(width =.1))) } } else { print(ggplot(data = lbw, aes_string(x = name, y = "bwt")) + geom_point(position = position_jitter(width =.1)) + geom_smooth()) } 7

8 bwt age bwt lwt 8

9 bwt white black other race 4000 bwt no smoke yes 9

10 bwt ptl bwt no ht yes 10

11 bwt no ui yes 4000 bwt ftv 6. Compute the full regression with all the variables and compute its summary (and maybe its diagnostic plots). 11

12 reglbw <- lm(bwt ~., data = lbw) summary(reglbw) Call: lm(formula = bwt ~., data = lbw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** age lwt * raceblack ** raceother ** smokeyes ** ptl htyes ** uiyes *** ftv Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 179 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 179 DF, p-value: 7.891e-08 autoplot(reglbw) 12

13 Residuals Residuals vs Fitted Fitted values Standardized residuals Normal Q Q Theoretical Quantiles Standardized residuals Scale Location Fitted values Standardized Residuals Residuals vs Leverage Leverage 7. Compute the trivial regression with no variables but the intercept as a reference of a _bad_method reglbwtriv <- lm(bwt ~ 1, data = lbw) summary(reglbwtriv) Call: lm(formula = bwt ~ 1, data = lbw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 188 degrees of freedom 7. Create a function that given a lm model computes the empirical error, the debiased error, the cross validation error, the deviance ( 2 Log-likelihood), the AIC criteria and the BIC criteria. V <- 5 LbwFolds <- caret::createmultifolds(lbw[["bwt"]], k = V, times = T) 13

14 computeerrlm <- function(model, name) { err <- mean((lbw[["bwt"]]-predict(model))^2) errcp <- err * ( * length(model[["coefficients"]]) / nrow(lbw)) errcvtmp <- matrix(0, nrow = 1, ncol = (T*V)) for (v in 1: (T*V)) { lbwtrain <- slice(lbw, LbwFolds[[v]]) lbwtest <- slice(lbw, -LbwFolds[[v]]) regtmp <- lm(model, data = lbwtrain) predtmp <- predict(regtmp, newdata = lbwtest) errcvtmp[v] <- mean((lbwtest[["bwt"]]-predtmp)^2) } errcv <- mean(errcvtmp) errcvup <- errcv + 2 * sd(errcvtmp) / sqrt(t*v) LogLik <- -2 * loglik(model) LogLikAIC <- AIC(model) LogLikBIC <- BIC(model) } data.frame( method = name, err = err, errcp = errcp, errcv = errcv, errcvup = errcvup, LogLik = LogLik, LogLikAIC = LogLikAIC, LogLikBIC = LogLikBIC) 8. Compute the errors of the trivial and the full model. errs <- computeerrlm(reglbwtriv, "Trivial") errs <- rbind(errs, computeerrlm(reglbw, "Full")) errs method err errcp errcv errcvup LogLik LogLikAIC LogLikBIC 1 Trivial Full Create a function that takes a data frame of errors for possibly several models and plot them. Test it on the full model. Plot_Err <- function(errs) { ggplot(data = melt(select(errs, -matches("loglik"))), aes(x = method, y = value, color = variable)) + geom_point(size = 5) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) } Plot_Err(errs) 14

15 value variable err errcp errcv errcvup Trivial method Full Plot_LogLik <- function(errs) { ggplot(data = melt(select(errs, -matches("err"))), aes(x = method, y = value, color = variable)) + geom_point(size = 5) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) } Plot_LogLik(errs) 15

16 value 3000 variable LogLik LogLikAIC LogLikBIC 2980 Trivial method Full 10. According to the summary, which variables can be removed from the model? Test this assumption by removing them, computing the errors and ploting them for the two models. reglbw2 <- update(reglbw, ~. - age - ptl -ftv) summary(reglbw2) Call: lm(formula = bwt ~ lwt + race + smoke + ht + ui, data = lbw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** lwt * raceblack ** raceother ** smokeyes *** htyes ** uiyes *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 182 degrees of freedom 16

17 Multiple R-squared: , Adjusted R-squared: F-statistic: 9.6 on 6 and 182 DF, p-value: 3.601e-09 errs <- rbind(errs, computeerrlm(reglbw2, "Simplified")) Plot_Err(errs) value variable err errcp errcv errcvup Trivial Full method Simplified Plot_LogLik(errs) 17

18 value 3000 variable LogLik LogLikAIC LogLikBIC 2980 Trivial Full method Simplified Find_Best <- function(errs) { nameserr <- names(errs)[-1] for (nameerr in nameserr) { writelines(strwrap(paste(nameerr, ": ", errs[["method"]][which.min(errs[[nameerr]])], "(",min(errs[[nameerr]], na.rm =TRUE),")"))) } } Find_Best(errs) err : Full ( ) errcp : Simplified ( ) errcv : Simplified ( ) errcvup : Simplified ( ) LogLik : Full ( ) LogLikAIC : Simplified ( ) LogLikBIC : Simplified ( ) 11. What would be the next simplification? Is it efficient? reglbw3 <- update(reglbw2, ~. - lwt) summary(reglbw3) Call: 18

19 lm(formula = bwt ~ race + smoke + ht + ui, data = lbw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** raceblack ** raceother *** smokeyes *** htyes * uiyes e-05 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 183 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 5 and 183 DF, p-value: 1.98e-08 errs <- rbind(errs, computeerrlm(reglbw3, "Simplified2")) Plot_Err(errs) value variable err errcp errcv errcvup Trivial Full method Simplified Simplified2 Plot_LogLik(errs) 19

20 value 3000 variable LogLik LogLikAIC LogLikBIC 2980 Trivial Full method Simplified Simplified2 Find_Best(errs) err : Full ( ) errcp : Simplified ( ) errcv : Simplified ( ) errcvup : Simplified ( ) LogLik : Full ( ) LogLikAIC : Simplified ( ) LogLikBIC : Simplified ( ) 12. Use glmulti from the package of the same name to test all the possible variable subset without any interaction. (Use the level = 1 option!) What is the best model according to the AIC criterion. library(glmulti) bests <- glmulti(bwt ~., data = lbw, level = 1, family = "gaussian", plotty = FALSE) #You may use plott Initialization... TASK: Exhaustive screening of candidate set. Fitting... After 50 models: Best model: bwt~1+race+smoke Crit= Mean crit= After 100 models: 20

21 Best model: bwt~1+race+smoke+lwt Crit= Mean crit= After 150 models: Best model: bwt~1+race+smoke+ht+lwt Crit= Mean crit= After 200 models: Best model: bwt~1+race+smoke+ui+lwt Crit= Mean crit= After 250 models: Best model: bwt~1+race+smoke+ht+ui Crit= Mean crit= Completed. 13. contains a list of the best models. Use this to compute all the errors for the 25 bests models. Compare those errors with those of our naive attempts. errmulti <- data.frame() for (f in 1:50) { model <- lm(bests@formulas[[f]], data = lbw) errmulti <- rbind(errmulti, computeerrlm(model,sprintf("best_%g",f))) } errs_multi <- rbind(errs, errmulti) Plot_Err(errs_multi) 21

22 Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 method value variable err errcp errcv errcvup Plot_LogLik(errs_multi) Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 method value variable LogLik LogLikAIC LogLikBIC 22

23 Find_Best(errmulti) err : Best_9 ( ) errcp : Best_1 ( ) errcv : Best_1 ( ) errcvup : Best_4 ( ) LogLik : Best_9 ( ) LogLikAIC : Best_1 ( ) LogLikBIC : Best_1 ( ) Find_Best(errs_multi) err : Full ( ) errcp : Simplified ( ) errcv : Simplified ( ) errcvup : Best_4 ( ) LogLik : Full ( ) LogLikAIC : Simplified ( ) LogLikBIC : Simplified ( ) 14. Add the interaction of level 2 and use glmulti with method = d to find the number of model. Is the exhaustive search possible? glmulti(bwt ~., data = lbw, level = 2, family = "gaussian", method ="d", plotty = FALSE) #You may use p Initialization... TASK: Diagnostic of candidate set. Sample size: factor(s). 4 covariate(s). 0 f exclusion(s). 0 c exclusion(s). 0 f:f exclusion(s). 0 c:c exclusion(s). 0 f:c exclusion(s). Size constraints: min = 0 max = -1 Complexity constraints: min = 0 max = -1 Your candidate set contains models. [1] Use the genetic algorithm of glmulti (method = g ) to explore those models and examine the best 25 solutions. bestsgen <- glmulti(bwt ~., data = lbw, level = 2, family = "gaussian", method ="g", plotty = FALSE) #Y Initialization... TASK: Genetic algorithm in the candidate set. Initialization... Algorithm started... 23

24 After 10 generations: Best model: bwt~1+race+smoke+ht+ui+age+lwt+ptl+lwt:age+ptl:age+ftv:ptl+smoke:age+smoke:ptl+ht:age+ht: Crit= Mean crit= Change in best IC: / Change in mean IC: After 20 generations: Best model: bwt~1+race+smoke+ui+age+lwt+ptl+ftv+lwt:age+ptl:age+ptl:lwt+ftv:age+smoke:age+smoke:ptl+h Crit= Mean crit= Change in best IC: / Change in mean IC: After 30 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: / Change in mean IC: After 40 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 50 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 60 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 70 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 80 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 90 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC:

25 After 100 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:age+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+u Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 110 generations: Best model: bwt~1+race+smoke+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: / Change in mean IC: After 120 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: / Change in mean IC: After 130 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 140 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 150 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 160 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 170 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 180 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC:

26 After 190 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 200 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 210 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 220 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 230 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 240 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 250 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 260 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 270 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC:

27 After 280 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 290 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 300 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 310 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 320 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 330 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 340 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 350 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 360 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC:

28 After 370 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: 0 After 380 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 390 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 400 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 410 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 420 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 430 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 440 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 450 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC:

29 After 460 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 470 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: 0 After 480 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 490 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 500 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: 0 After 510 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 520 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 530 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 540 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: 0 29

30 After 550 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 560 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Change in best IC: 0 / Change in mean IC: After 570 generations: Best model: bwt~1+race+age+lwt+ptl+ftv+ptl:lwt+ftv:age+smoke:age+ht:age+ht:lwt+ui:lwt+ui:ptl Crit= Mean crit= Improvements in best and average IC have bebingo en below the specified goals. Algorithm is declared to have converged. Completed. errgen <- data.frame() for (f in 1:25) { model <- lm(bestsgen@formulas[[f]], data = lbw) errgen <- rbind(errgen, computeerrlm(model,sprintf("bestgen_%g",f))) } errs_gen <- rbind(errs_multi, errgen) Plot_Err(errs_gen) 30

31 4e+05 5e+05 6e+05 Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 BestGen_1 BestGen_2 BestGen_3 BestGen_4 BestGen_5 BestGen_6 BestGen_7 BestGen_8 BestGen_9 BestGen_10 BestGen_11 BestGen_12 BestGen_13 BestGen_14 BestGen_15 BestGen_16 BestGen_17 BestGen_18 BestGen_19 BestGen_20 BestGen_21 BestGen_22 BestGen_23 BestGen_24 BestGen_25 method value variable err errcp errcv errcvup Plot_LogLik(errs_gen) Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 BestGen_1 BestGen_2 BestGen_3 BestGen_4 BestGen_5 BestGen_6 BestGen_7 BestGen_8 BestGen_9 BestGen_10 BestGen_11 BestGen_12 BestGen_13 BestGen_14 BestGen_15 BestGen_16 BestGen_17 BestGen_18 BestGen_19 BestGen_20 BestGen_21 BestGen_22 BestGen_23 BestGen_24 BestGen_25 method value variable LogLik LogLikAIC LogLikBIC 31

32 Find_Best(errgen) err : BestGen_19 ( ) errcp : BestGen_1 ( ) errcv : BestGen_2 ( ) errcvup : BestGen_2 ( ) LogLik : BestGen_19 ( ) LogLikAIC : BestGen_1 ( ) LogLikBIC : BestGen_2 ( ) Find_Best(errs_gen) err : BestGen_19 ( ) errcp : BestGen_1 ( ) errcv : BestGen_2 ( ) errcvup : BestGen_2 ( ) LogLik : BestGen_19 ( ) LogLikAIC : BestGen_1 ( ) LogLikBIC : Simplified ( ) 16. Use glmnet to try a regularization method to obtain a best model. X <- model.matrix(bwt ~.^2-1, data = lbw) Y <- lbw[["bwt"]] library("glmnet") lbw_lasso <- glmnet(x, Y, family = "gaussian") coeffs_lbw_lasso <- cbind(data.frame(t(as.matrix(coef(lbw_lasso)))), lambda = lbw_lasso[["lambda"]]) ggplot(data = melt(coeffs_lbw_lasso, "lambda"), aes(x = lambda, y = value, color = variable)) + geom_lin 32

33 value lambda age.smokeyes age.ptl age.htyes age.uiyes age.ftv lwt.raceblack lwt.raceother lwt.smokeyes lwt.ptl lwt.htyes lwt.uiyes lwt.ftv raceblack.smokeyes raceother.smokeyes raceblack.ptl raceother.ptl raceblack.htyes raceother.htyes raceblack.uiyes computeerrglmnet <- function(model, lambda, name) { err <- mean((y-predict(model, X, lambda))^2) errcp <- err * ( * (sum(abs(coef(model,lambda))>0)) / nrow(lbw)) errcvtmp <- matrix(0, nrow = 1, ncol = (T*V)) for (v in 1: (T*V)) { Xtrain <- X[LbwFolds[[v]],] Xtest <- X[-LbwFolds[[v]],] Ytrain <- Y[LbwFolds[[v]]] Ytest <- Y[-LbwFolds[[v]]] regtmp <- glmnet(xtrain, Ytrain, family = "gaussian", lambda = lambda) predtmp <- predict(regtmp, Xtest, lambda) errcvtmp[v] <- mean((ytest-predtmp)^2) } errcv <- mean(errcvtmp) errcvup <- errcv + 2 * sd(errcvtmp) / sqrt(t*v) } data.frame( method = name, err = err, errcp = errcp, errcv = errcv, errcvup = errcvup, LogLik = NA, LogLikAIC = NA, LogLikBIC = NA) computeerrlm2 <- function(model, name) { err <- mean((lbwint[["bwt"]]-predict(model))^2) errcp <- err * ( * length(model[["coefficients"]]) / nrow(lbw)) errcvtmp <- matrix(0, nrow = 1, ncol = (T*V)) 33

34 for (v in 1: (T*V)) { lbwtrain <- slice(lbwint, LbwFolds[[v]]) lbwtest <- slice(lbwint, -LbwFolds[[v]]) regtmp <- lm(model, data = lbwtrain) predtmp <- predict(regtmp, newdata = lbwtest) errcvtmp[v] <- mean((lbwtest[["bwt"]]-predtmp)^2) } errcv <- mean(errcvtmp) errcvup <- errcv + 2 * sd(errcvtmp) / sqrt(t*v) LogLik <- -2 * loglik(model) LogLikAIC <- AIC(model) LogLikBIC <- BIC(model) } data.frame( method = name, err = err, errcp = errcp, errcv = errcv, errcvup = errcvup, LogLik = LogLik, LogLikAIC = LogLikAIC, LogLikBIC = LogLikBIC) errlambda <- data.frame() errlambdasup <- data.frame() dx <- data.frame(x) lbwint <- cbind(dx, bwt = Y) for (l in 1:length(lbw_lasso[["lambda"]])) { lambda <- lbw_lasso[["lambda"]][l] errlambda <- rbind(errlambda, computeerrglmnet(lbw_lasso, lambda, sprintf("lasso_%g",l))) subsetlambda <- which(abs(coef(lbw_lasso,lambda)[-1]) > 0) if (length(subsetlambda)>0) { reglambda <- lm(bwt ~., data = mutate(select(dx, subsetlambda), bwt = Y)) errlambdasup <- rbind(errlambdasup, computeerrlm2(reglambda, sprintf("lassosup_%g",l))) } } errs_lasso <- rbind(errs_gen, errlambda, errlambdasup) Plot_Err(errs_lasso) 34

35 0e+00 1e+07 2e+07 3e+07 Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 BestGen_1 BestGen_2 BestGen_3 BestGen_4 BestGen_5 BestGen_6 BestGen_7 BestGen_8 BestGen_9 BestGen_10 BestGen_11 BestGen_12 BestGen_13 BestGen_14 BestGen_15 BestGen_16 BestGen_17 BestGen_18 BestGen_19 BestGen_20 BestGen_21 BestGen_22 BestGen_23 BestGen_24 BestGen_25 Lasso_1 Lasso_2 Lasso_3 Lasso_4 Lasso_5 Lasso_6 Lasso_7 Lasso_8 Lasso_9 Lasso_10 Lasso_11 Lasso_12 Lasso_13 Lasso_14 Lasso_15 Lasso_16 Lasso_17 Lasso_18 Lasso_19 Lasso_20 Lasso_21 Lasso_22 Lasso_23 Lasso_24 Lasso_25 Lasso_26 Lasso_27 Lasso_28 Lasso_29 Lasso_30 Lasso_31 Lasso_32 Lasso_33 Lasso_34 Lasso_35 Lasso_36 Lasso_37 Lasso_38 Lasso_39 Lasso_40 Lasso_41 Lasso_42 Lasso_43 Lasso_44 Lasso_45 Lasso_46 Lasso_47 Lasso_48 Lasso_49 Lasso_50 Lasso_51 Lasso_52 Lasso_53 Lasso_54 Lasso_55 Lasso_56 Lasso_57 Lasso_58 Lasso_59 Lasso_60 Lasso_61 Lasso_62 Lasso_63 Lasso_64 Lasso_65 Lasso_66 Lasso_67 Lasso_68 Lasso_69 Lasso_70 Lasso_71 Lasso_72 Lasso_73 Lasso_74 Lasso_75 Lasso_76 Lasso_77 Lasso_78 Lasso_79 Lasso_80 Lasso_81 Lasso_82 Lasso_83 Lasso_84 Lasso_85 Lasso_86 Lasso_87 Lasso_88 Lasso_89 Lasso_90 Lasso_91 Lasso_92 Lasso_93 LassoSup_2 LassoSup_3 LassoSup_4 LassoSup_5 LassoSup_6 LassoSup_7 LassoSup_8 LassoSup_9 LassoSup_10 LassoSup_11 LassoSup_12 LassoSup_13 LassoSup_14 LassoSup_15 LassoSup_16 LassoSup_17 LassoSup_18 LassoSup_19 LassoSup_20 LassoSup_21 LassoSup_22 LassoSup_23 LassoSup_24 LassoSup_25 LassoSup_26 LassoSup_27 LassoSup_28 LassoSup_29 LassoSup_30 LassoSup_31 LassoSup_32 LassoSup_33 LassoSup_34 LassoSup_35 LassoSup_36 LassoSup_37 LassoSup_38 LassoSup_39 LassoSup_40 LassoSup_41 LassoSup_42 LassoSup_43 LassoSup_44 LassoSup_45 LassoSup_46 LassoSup_47 LassoSup_48 LassoSup_49 LassoSup_50 LassoSup_51 LassoSup_52 LassoSup_53 LassoSup_54 LassoSup_55 LassoSup_56 LassoSup_57 LassoSup_58 LassoSup_59 LassoSup_60 LassoSup_61 LassoSup_62 LassoSup_63 LassoSup_64 LassoSup_65 LassoSup_66 LassoSup_67 LassoSup_68 LassoSup_69 LassoSup_70 LassoSup_71 LassoSup_72 LassoSup_73 LassoSup_74 LassoSup_75 LassoSup_76 LassoSup_77 LassoSup_78 LassoSup_79 LassoSup_80 LassoSup_81 LassoSup_82 LassoSup_83 LassoSup_84 LassoSup_85 LassoSup_86 LassoSup_87 LassoSup_88 LassoSup_89 LassoSup_90 LassoSup_91 LassoSup_92 LassoSup_93 method value variable err errcp errcv errcvup Plot_LogLik(errs_lasso) Trivial Full Simplified Simplified2 Best_1 Best_2 Best_3 Best_4 Best_5 Best_6 Best_7 Best_8 Best_9 Best_10 Best_11 Best_12 Best_13 Best_14 Best_15 Best_16 Best_17 Best_18 Best_19 Best_20 Best_21 Best_22 Best_23 Best_24 Best_25 Best_26 Best_27 Best_28 Best_29 Best_30 Best_31 Best_32 Best_33 Best_34 Best_35 Best_36 Best_37 Best_38 Best_39 Best_40 Best_41 Best_42 Best_43 Best_44 Best_45 Best_46 Best_47 Best_48 Best_49 Best_50 BestGen_1 BestGen_2 BestGen_3 BestGen_4 BestGen_5 BestGen_6 BestGen_7 BestGen_8 BestGen_9 BestGen_10 BestGen_11 BestGen_12 BestGen_13 BestGen_14 BestGen_15 BestGen_16 BestGen_17 BestGen_18 BestGen_19 BestGen_20 BestGen_21 BestGen_22 BestGen_23 BestGen_24 BestGen_25 Lasso_1 Lasso_2 Lasso_3 Lasso_4 Lasso_5 Lasso_6 Lasso_7 Lasso_8 Lasso_9 Lasso_10 Lasso_11 Lasso_12 Lasso_13 Lasso_14 Lasso_15 Lasso_16 Lasso_17 Lasso_18 Lasso_19 Lasso_20 Lasso_21 Lasso_22 Lasso_23 Lasso_24 Lasso_25 Lasso_26 Lasso_27 Lasso_28 Lasso_29 Lasso_30 Lasso_31 Lasso_32 Lasso_33 Lasso_34 Lasso_35 Lasso_36 Lasso_37 Lasso_38 Lasso_39 Lasso_40 Lasso_41 Lasso_42 Lasso_43 Lasso_44 Lasso_45 Lasso_46 Lasso_47 Lasso_48 Lasso_49 Lasso_50 Lasso_51 Lasso_52 Lasso_53 Lasso_54 Lasso_55 Lasso_56 Lasso_57 Lasso_58 Lasso_59 Lasso_60 Lasso_61 Lasso_62 Lasso_63 Lasso_64 Lasso_65 Lasso_66 Lasso_67 Lasso_68 Lasso_69 Lasso_70 Lasso_71 Lasso_72 Lasso_73 Lasso_74 Lasso_75 Lasso_76 Lasso_77 Lasso_78 Lasso_79 Lasso_80 Lasso_81 Lasso_82 Lasso_83 Lasso_84 Lasso_85 Lasso_86 Lasso_87 Lasso_88 Lasso_89 Lasso_90 Lasso_91 Lasso_92 Lasso_93 LassoSup_2 LassoSup_3 LassoSup_4 LassoSup_5 LassoSup_6 LassoSup_7 LassoSup_8 LassoSup_9 LassoSup_10 LassoSup_11 LassoSup_12 LassoSup_13 LassoSup_14 LassoSup_15 LassoSup_16 LassoSup_17 LassoSup_18 LassoSup_19 LassoSup_20 LassoSup_21 LassoSup_22 LassoSup_23 LassoSup_24 LassoSup_25 LassoSup_26 LassoSup_27 LassoSup_28 LassoSup_29 LassoSup_30 LassoSup_31 LassoSup_32 LassoSup_33 LassoSup_34 LassoSup_35 LassoSup_36 LassoSup_37 LassoSup_38 LassoSup_39 LassoSup_40 LassoSup_41 LassoSup_42 LassoSup_43 LassoSup_44 LassoSup_45 LassoSup_46 LassoSup_47 LassoSup_48 LassoSup_49 LassoSup_50 LassoSup_51 LassoSup_52 LassoSup_53 LassoSup_54 LassoSup_55 LassoSup_56 LassoSup_57 LassoSup_58 LassoSup_59 LassoSup_60 LassoSup_61 LassoSup_62 LassoSup_63 LassoSup_64 LassoSup_65 LassoSup_66 LassoSup_67 LassoSup_68 LassoSup_69 LassoSup_70 LassoSup_71 LassoSup_72 LassoSup_73 LassoSup_74 LassoSup_75 LassoSup_76 LassoSup_77 LassoSup_78 LassoSup_79 LassoSup_80 LassoSup_81 LassoSup_82 LassoSup_83 LassoSup_84 LassoSup_85 LassoSup_86 LassoSup_87 LassoSup_88 LassoSup_89 LassoSup_90 LassoSup_91 LassoSup_92 LassoSup_93 method value variable LogLik LogLikAIC LogLikBIC 35

36 Find_Best(errlambdasup) err : LassoSup_54 ( ) errcp : LassoSup_7 ( ) errcv : LassoSup_7 ( ) errcvup : LassoSup_7 ( ) LogLik : LassoSup_54 ( ) LogLikAIC : LassoSup_7 ( ) LogLikBIC : LassoSup_5 ( ) Find_Best(errs_lasso) err : LassoSup_54 ( ) errcp : BestGen_1 ( ) errcv : BestGen_2 ( ) errcvup : BestGen_2 ( ) LogLik : LassoSup_54 ( ) LogLikAIC : BestGen_1 ( ) LogLikBIC : LassoSup_5 ( ) 17. Find a better model... 36

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income Announcements Announcements Unit 7: Multiple Linear Regression Lecture 3: Case Study Statistics 101 Mine Çetinkaya-Rundel April 18, 2013 OH: Sunday: Virtual OH, 3-4pm - you ll receive an email invitation

More information

Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong,

Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong, Statistical Analyses on Roger Federer s Performances in 2013 and 2014 James Kong, kong.james@berkeley.edu Introduction Tennis has become a global sport and I am a big fan of tennis. Since I played college

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the

More information

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Minimal influence of wind and tidal height on underwater noise in Haro Strait Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents

More information

El Cerrito Sporting Goods Ira Sharenow January 7, 2019

El Cerrito Sporting Goods Ira Sharenow January 7, 2019 El Cerrito Sporting Goods Ira Sharenow January 7, 2019 R Markdown The goal of the analysis is to determine if any of the salespersons are performing exceptionally well or exceptionally poorly. In particular,

More information

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic, SEDAR19-DW-14 United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic, 1993-2008 Kevin McCarthy and Neil Baertlein National Marine Fisheries Service,

More information

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday. Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.

More information

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010 ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted

More information

Measuring Batting Performance

Measuring Batting Performance Measuring Batting Performance Authors: Samantha Attar, Hannah Dineen, Andy Fullerton, Nora Hanson, Cam Kelso, Katie McLaughlin, and Caitlyn Nolan Introduction: The following analysis compares slugging

More information

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can

More information

Evaluating and Classifying NBA Free Agents

Evaluating and Classifying NBA Free Agents Evaluating and Classifying NBA Free Agents Shanwei Yan In this project, I applied machine learning techniques to perform multiclass classification on free agents by using game statistics, which is useful

More information

Addendum to SEDAR16-DW-22

Addendum to SEDAR16-DW-22 Addendum to SEDAR16-DW-22 Introduction Six king mackerel indices of abundance, two for each region Gulf of Mexico, South Atlantic, and Mixing Zone, were constructed for the SEDAR16 data workshop using

More information

USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS

USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS SCRS/2008/166 USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS Humber A. Andrade 1 SUMMARY In order to standardize catch per unit

More information

Standardized catch rates of yellowtail snapper ( Ocyurus chrysurus

Standardized catch rates of yellowtail snapper ( Ocyurus chrysurus Standardized catch rates of yellowtail snapper (Ocyurus chrysurus) from the Marine Recreational Fisheries Statistics Survey in south Florida, 1981-2010 Introduction Yellowtail snapper are caught by recreational

More information

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced

More information

Empirical Example II of Chapter 7

Empirical Example II of Chapter 7 Empirical Example II of Chapter 7 1. We use NBA data. The description of variables is --- --- --- storage display value variable name type format label variable label marr byte %9.2f =1 if married wage

More information

Lab 11: Introduction to Linear Regression

Lab 11: Introduction to Linear Regression Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that

More information

Cami T. McCandless and Joesph J. Mello SEDAR39- DW June 2014

Cami T. McCandless and Joesph J. Mello SEDAR39- DW June 2014 Standardized indices of abundance for Smooth Dogfish, Mustelus canis, from the Northeast Fisheries Observer Program Cami T. McCandless and Joesph J. Mello SEDAR39- DW- 09 30 June 2014 This information

More information

SEDAR52-WP November 2017

SEDAR52-WP November 2017 Using a Censored Regression Modeling Approach to Standardize Red Snapper Catch per Unit Effort Using Recreational Fishery Data Affected by a Bag Limit Skyler Sagarese and Adyan Rios SEDAR52-WP-13 15 November

More information

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide

ISDS 4141 Sample Data Mining Work. Tool Used: SAS Enterprise Guide ISDS 4141 Sample Data Mining Work Taylor C. Veillon Tool Used: SAS Enterprise Guide You may have seen the movie, Moneyball, about the Oakland A s baseball team and general manager, Billy Beane, who focused

More information

Special Topics: Data Science

Special Topics: Data Science Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics

More information

Case Studies Homework 3

Case Studies Homework 3 Case Studies Homework 3 Breanne Chryst September 11, 2013 1 In this assignment I did some exploratory analysis on a data set containing diving information from the 2000 Olympics. My code and output is

More information

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher Mini-project 3 Tennis ball launcher Mini-Project 3 requires you to use MATLAB to model the trajectory of a tennis ball being shot from a tennis ball launcher to a player. The tennis ball trajectory model

More information

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:

More information

A Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018

A Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018 A Shallow Dive into Deep Sea Data Sarah Solie and Arielle Fogel 7/18/2018 Introduction The datasets This data expedition will utilize the World Ocean Atlas (WOA) database to explore two deep sea physical

More information

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS KELLIN RUMSEY Introduction The 2016 National Basketball Association championship featured two of the leagues biggest names. The Golden State Warriors Stephen

More information

Copy of my report. Why am I giving this talk. Overview. State highway network

Copy of my report. Why am I giving this talk. Overview. State highway network Road Surface characteristics and traffic accident rates on New Zealand s state highway network Robert Davies Statistics Research Associates http://www.statsresearch.co.nz Copy of my report There is a copy

More information

Predictors for Winning in Men s Professional Tennis

Predictors for Winning in Men s Professional Tennis Predictors for Winning in Men s Professional Tennis Abstract In this project, we use logistic regression, combined with AIC and BIC criteria, to find an optimal model in R for predicting the outcome of

More information

Lecture 5. Optimisation. Regularisation

Lecture 5. Optimisation. Regularisation Lecture 5. Optimisation. Regularisation COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Iterative optimisation Loss functions Coordinate

More information

Standardized catch rates of Atlantic king mackerel (Scomberomorus cavalla) from the North Carolina Commercial fisheries trip ticket.

Standardized catch rates of Atlantic king mackerel (Scomberomorus cavalla) from the North Carolina Commercial fisheries trip ticket. SEDAR16 DW 11 Standardized catch rates of Atlantic king mackerel (Scomberomorus cavalla) from the North Carolina Commercial fisheries trip ticket. Alan Bianchi 1 and Mauricio Ortiz 2 SUMMARY Standardized

More information

BBS Fall Conference, 16 September Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study

BBS Fall Conference, 16 September Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study BBS Fall Conference, 16 September 211 Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study Didier Renard Background (1) Small molecule delivered by lung

More information

ANOVA - Implementation.

ANOVA - Implementation. ANOVA - Implementation http://www.pelagicos.net/classes_biometry_fa17.htm Doing an ANOVA With RCmdr Categorical Variable One-Way ANOVA Testing a single Factor dose with 3 treatments (low, mid, high) Doing

More information

Sample Final Exam MAT 128/SOC 251, Spring 2018

Sample Final Exam MAT 128/SOC 251, Spring 2018 Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv

More information

Updated and revised standardized catch rate of blue sharks caught by the Taiwanese longline fishery in the Indian Ocean

Updated and revised standardized catch rate of blue sharks caught by the Taiwanese longline fishery in the Indian Ocean Updated and revised standardized catch rate of blue sharks caught by the Taiwanese longline fishery in the Indian Ocean Wen-Pei Tsai 1,3 and Kwang-Ming Liu 2 1 Department of Fisheries Production and Management,

More information

Development of Decision Support Tools to Assess Pedestrian and Bicycle Safety: Development of Safety Performance Function

Development of Decision Support Tools to Assess Pedestrian and Bicycle Safety: Development of Safety Performance Function Development of Decision Support Tools to Assess Pedestrian and Bicycle Safety: Development of Safety Performance Function Valerian Kwigizile, Jun Oh, Ron Van Houten, & Keneth Kwayu INTRODUCTION 2 OVERVIEW

More information

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Unit 4: Inference for numerical variables Lecture 3: ANOVA Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013 Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST

More information

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Mark Thackham Credit Scoring and Credit Control XV, Edinburgh, August 2017 1 / 27 Objective The objective of this presentation

More information

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Predicting the use of the Sacrifice Bunt in Major League Baseball Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Understanding the Data Data from the St. Louis Cardinals Sig Mejdal, Senior Quantitative

More information

Factorial Analysis of Variance

Factorial Analysis of Variance Factorial Analysis of Variance Overview of the Factorial ANOVA Factorial ANOVA (Two-Way) In the context of ANOVA, an independent variable (or a quasiindependent variable) is called a factor, and research

More information

NFL Direction-Oriented Rushing O -Def Plus-Minus

NFL Direction-Oriented Rushing O -Def Plus-Minus NFL Direction-Oriented Rushing O -Def Plus-Minus ID: 6289 In football, rushing is an action of advancing the ball forward by running with it, instead of passing. Rush o ense refers to how well a team is

More information

PRESENTS. Solder & Oven Profiles Critical Process Variables

PRESENTS. Solder & Oven Profiles Critical Process Variables PRESENTS ubga s Solder & Oven Profiles Critical Process Variables Phone: 949.713.7229 Fax: 949.713.7229 For Information Please Call or E-mail palsrvs@palsrvs.com Member of: PUBLISHED BY Pan Pacific Microelectronics

More information

Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries

Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries Systematic Review and Meta-analysis of Bicycle Helmet Efficacy to Mitigate Head, Face and Neck Injuries Prudence Creighton & Jake Olivier MATHEMATICS & THE UNIVERSITY OF NEW STATISTICS SOUTH WALES Creighton

More information

Mining and Agricultural Productivity

Mining and Agricultural Productivity Mining and Agricultural Productivity - A presentation of preliminary results for Ghana The World Bank, Washington D.C., USA, 29 May 214 MAGNUS ANDERSSON, Senior Lecturer in Human Geography, Lund University,

More information

Reproducible Research: Peer Assessment 1

Reproducible Research: Peer Assessment 1 Introduction Reproducible Research: Peer Assessment 1 It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or

More information

Biomechanics and Models of Locomotion

Biomechanics and Models of Locomotion Physics-Based Models for People Tracking: Biomechanics and Models of Locomotion Marcus Brubaker 1 Leonid Sigal 1,2 David J Fleet 1 1 University of Toronto 2 Disney Research, Pittsburgh Biomechanics Biomechanics

More information

Statistical and Econometric Methods for Transportation Data Analysis

Statistical and Econometric Methods for Transportation Data Analysis Statistical and Econometric Methods for Transportation Data Analysis Chapter 13 Discrete Outcome Models Example 13.1b Discrete Outcome Data FIML Nested Logit I As in assignments 13-1 and 13-2, you are

More information

Session 2: Introduction to Multilevel Modeling Using SPSS

Session 2: Introduction to Multilevel Modeling Using SPSS Session 2: Introduction to Multilevel Modeling Using SPSS Exercise 1 Description of Data: exerc1 This is a dataset from Kasia Kordas s research. It is data collected on 457 children clustered in schools.

More information

Sensing and Modeling of Terrain Features using Crawling Robots

Sensing and Modeling of Terrain Features using Crawling Robots Czech Technical University in Prague Sensing and Modeling of Terrain Features using Crawling Robots Jakub Mrva 1 Faculty of Electrical Engineering Agent Technology Center Computational Robotics Laboratory

More information

Football Player s Performance and Market Value

Football Player s Performance and Market Value Football Player s Performance and Market Value Miao He 1, Ricardo Cachucho 1, and Arno Knobbe 1,2 1 LIACS, Leiden University, the Netherlands, r.cachucho@liacs.leidenuniv.nl 2 Amsterdam University of Applied

More information

Supporting Online Material for

Supporting Online Material for Originally posted 16 September 2011; corrected 18 April 2012 www.sciencemag.org/cgi/content/full/333/6049/1627/dc1 Supporting Online Material for Faking Giants: The Evolution of High Prey Clearance Rates

More information

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie QFC Technical Report T2010-01 Prepared for: Ohio Department of Natural

More information

STAT 625: 2000 Olympic Diving Exploration

STAT 625: 2000 Olympic Diving Exploration Corey S Brier, Department of Statistics, Yale University 1 STAT 625: 2000 Olympic Diving Exploration Corey S Brier Yale University Abstract This document contains a preliminary investigation of data from

More information

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question: Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,

More information

dplyr & Functions stat 480 Heike Hofmann

dplyr & Functions stat 480 Heike Hofmann dplyr & Functions stat 480 Heike Hofmann Outline dplyr functions and package Functions library(dplyr) data(baseball, package= plyr )) Your Turn Use data(baseball, package="plyr") to make the baseball dataset

More information

EXST7015: Salaries of all American league baseball players (1994) Salaries in thousands of dollars RAW DATA LISTING

EXST7015: Salaries of all American league baseball players (1994) Salaries in thousands of dollars RAW DATA LISTING ANOVA & Design Randomized Block Design Page 1 1 **EXAMPLE 1******************************************************; 2 *** The 1994 salaries of all American league baseball players ***; 3 *** as reported

More information

Bayesian model averaging with change points to assess the impact of vaccination and public health interventions

Bayesian model averaging with change points to assess the impact of vaccination and public health interventions Bayesian model averaging with change points to assess the impact of vaccination and public health interventions SUPPLEMENTARY METHODS Data sources U.S. hospitalization data were obtained from the Healthcare

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Case Processing Summary. Cases Valid Missing Total N Percent N Percent N Percent % 0 0.0% % % 0 0.0%

Case Processing Summary. Cases Valid Missing Total N Percent N Percent N Percent % 0 0.0% % % 0 0.0% GET FILE='C:\Users\acantrell\Desktop\demo5.sav'. DATASET NAME DataSet1 WINDOW=FRONT. EXAMINE VARIABLES=PASSYDSPG RUSHYDSPG /PLOT BOXPLOT HISTOGRAM /COMPARE GROUPS /STATISTICS DESCRIPTIVES /CINTERVAL 95

More information

Sports Predictive Analytics: NFL Prediction Model

Sports Predictive Analytics: NFL Prediction Model Sports Predictive Analytics: NFL Prediction Model By Dr. Ash Pahwa IEEE Computer Society San Diego Chapter January 17, 2017 Copyright 2017 Dr. Ash Pahwa 1 Outline Case Studies of Sports Analytics Sports

More information

New Technology used in sports. By:- ABKASH AGARWAL REGD NO BRANCH CSE(A)

New Technology used in sports. By:- ABKASH AGARWAL REGD NO BRANCH CSE(A) New Technology used in sports By:- ABKASH AGARWAL REGD NO-0901209326 BRANCH CSE(A) 1)Introduction 2)Abilities 3)Principle of HAWK EYE 4)Technology 5)Applications 6)Further Developments 7)Conclusion 8)References

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning

More information

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base

More information

Trouble With The Curve: Improving MLB Pitch Classification

Trouble With The Curve: Improving MLB Pitch Classification Trouble With The Curve: Improving MLB Pitch Classification Michael A. Pane Samuel L. Ventura Rebecca C. Steorts A.C. Thomas arxiv:134.1756v1 [stat.ap] 5 Apr 213 April 8, 213 Abstract The PITCHf/x database

More information

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing www.dlr.de Chart 1 > SESARInno > Fürstenau RTOFramerate> 2012-11-30 Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

More information

Online Diagnosis of Engine Dyno Test Benches: A Possibilistic Approach

Online Diagnosis of Engine Dyno Test Benches: A Possibilistic Approach Online Diagnosis of Engine Dyno Test Benches: A Possibilistic Approach S. Boverie (d), D. Dubois (c), X. Guérandel (a), O. de Mouzon (b), H. Prade (c) Engine dyno diagnostic BEST project Bench Expert System

More information

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Is lung capacity affected by smoking, sport, height or gender. Table of contents Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held As part of my research in Sebastian Thrun's autonomous driving team, my goal is to predict the wait-time for a car at a 4-way intersection.

More information

Chapter 12 Practice Test

Chapter 12 Practice Test Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value

More information

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March

More information

Package mrchmadness. April 9, 2017

Package mrchmadness. April 9, 2017 Package mrchmadness April 9, 2017 Title Numerical Tools for Filling Out an NCAA Basketball Tournament Bracket Version 1.0.0 URL https://github.com/elishayer/mrchmadness Imports dplyr, glmnet, Matrix, rvest,

More information

GLMM standardisation of the commercial abalone CPUE for Zones A-D over the period

GLMM standardisation of the commercial abalone CPUE for Zones A-D over the period GLMM standardisation of the commercial abalone for Zones A-D over the period 1980 2015 Anabela Brandão and Doug S. Butterworth Marine Resource Assessment & Management Group (MARAM) Department of Mathematics

More information

Paua research diver survey: review of data collected and simulation study of survey method

Paua research diver survey: review of data collected and simulation study of survey method Paua research diver survey: review of data collected and simulation study of survey method V. Haist Haist Consulting 6 Marina Way Nanoose Bay, BC Canada New Zealand Fisheries Assessment Report 00/8 November

More information

Standardized catch rates of U.S. blueline tilefish (Caulolatilus microps) from commercial logbook longline data

Standardized catch rates of U.S. blueline tilefish (Caulolatilus microps) from commercial logbook longline data Standardized catch rates of U.S. blueline tilefish (Caulolatilus microps) from commercial logbook longline data Sustainable Fisheries Branch, National Marine Fisheries Service, Southeast Fisheries Science

More information

ISyE 6414 Regression Analysis

ISyE 6414 Regression Analysis ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable

More information

1 PIPESYS Application

1 PIPESYS Application PIPESYS Application 1-1 1 PIPESYS Application 1.1 Gas Condensate Gathering System In this PIPESYS Application, the performance of a small gascondensate gathering system is modelled. Figure 1.1 shows the

More information

A computer program that improves its performance at some task through experience.

A computer program that improves its performance at some task through experience. 1 A computer program that improves its performance at some task through experience. 2 Example: Learn to Diagnose Patients T: Diagnose tumors from images P: Percent of patients correctly diagnosed E: Pre

More information

Constantinos Antoniou and Konstantinos Papoutsis

Constantinos Antoniou and Konstantinos Papoutsis Investigation of Greek driver behavior during the approach to suburban un-signalized intersections Constantinos Antoniou and Konstantinos Papoutsis National Technical University of Athens School of Rural

More information

Hellgate 100k Race Analysis February 12, 2015

Hellgate 100k Race Analysis February 12, 2015 Hellgate 100k Race Analysis brockwebb45@gmail.com February 12, 2015 Synopsis The Hellgate 100k is a tough, but rewarding race directed by Dr. David Horton. Taking place around the second week of December

More information

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their

More information

One-factor ANOVA by example

One-factor ANOVA by example ANOVA One-factor ANOVA by example 2 One-factor ANOVA by visual inspection 3 4 One-factor ANOVA H 0 H 0 : µ 1 = µ 2 = µ 3 = H A : not all means are equal 5 One-factor ANOVA but why not t-tests t-tests?

More information

Reshaping data in R. Duncan Golicher. December 9, 2008

Reshaping data in R. Duncan Golicher. December 9, 2008 Reshaping data in R Duncan Golicher December 9, 2008 One of the most frustrating and time consuming parts of statistical analysis is shuffling data into a format for analysis. No one enjoys changing data

More information

CONTRADICTORY CATCH RATES OF BLUE SHARK CAUGHT IN ATLANTIC OCEAN BY BRAZILIAN LONG-LINE FLEET AS ESTIMATED USING GENERALIZED LINEAR MODELS

CONTRADICTORY CATCH RATES OF BLUE SHARK CAUGHT IN ATLANTIC OCEAN BY BRAZILIAN LONG-LINE FLEET AS ESTIMATED USING GENERALIZED LINEAR MODELS SCRS/2008/132 CONTRADICTORY CATCH RATES OF BLUE SHARK CAUGHT IN ATLANTIC OCEAN BY BRAZILIAN LONG-LINE FLEET AS ESTIMATED USING GENERALIZED LINEAR MODELS Humber A. Andrade 1 SUMMARY Brazilian long-line

More information

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Study Overview: These data come from a daily diary study that followed 41 male and female college students over a six-week period to examine

More information

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com Naïve Bayes These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely available

More information

knn & Naïve Bayes Hongning Wang

knn & Naïve Bayes Hongning Wang knn & Naïve Bayes Hongning Wang CS@UVa Today s lecture Instance-based classifiers k nearest neighbors Non-parametric learning algorithm Model-based classifiers Naïve Bayes classifier A generative model

More information

THE INTEGRATION OF THE SEA BREAM AND SEA BASS MARKET: EVIDENCE FROM GREECE AND SPAIN

THE INTEGRATION OF THE SEA BREAM AND SEA BASS MARKET: EVIDENCE FROM GREECE AND SPAIN THE INTEGRATION OF THE SEA BREAM AND SEA BASS MARKET: EVIDENCE FROM GREECE AND SPAIN Lamprakis Avdelas, Managing Authority for the Operational programme for Fisheries, lamprakisa@in.gr Jordi Guillen, University

More information

Preparation for Salinity Control ME 121

Preparation for Salinity Control ME 121 Preparation for Salinity Control ME 121 This document describes a set of measurements and analyses that will help you to write an Arduino program to control the salinity of water in your fish tank. The

More information

Quantitative Methods for Economics Tutorial 6. Katherine Eyal

Quantitative Methods for Economics Tutorial 6. Katherine Eyal Quantitative Methods for Economics Tutorial 6 Katherine Eyal TUTORIAL 6 13 September 2010 ECO3021S Part A: Problems 1. (a) In 1857, the German statistician Ernst Engel formulated his famous law: Households

More information

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19 Chapter 13 Factorial ANOVA Patrick Mair 2015 Psych 1950 13 Factorial ANOVA 0 / 19 Today s Menu Now we extend our one-way ANOVA approach to two (or more) factors. Factorial ANOVA: two-way ANOVA, SS decomposition,

More information

Driver Behavior at Highway-Rail Grade Crossings With Passive Traffic Controls

Driver Behavior at Highway-Rail Grade Crossings With Passive Traffic Controls 2014 Global Level Crossing Symposium August 2014, Urbana, IL, USA Driver Behavior at Highway-Rail Grade Crossings With Passive Traffic Controls - A Driving Simulator Study Presenter: Dr. Asad J. Khattak

More information

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils 86 Pet.Sci.(29)6:86-9 DOI 1.17/s12182-9-16-x Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils Ehsan Khamehchi 1, Fariborz Rashidi

More information

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES: Slide 1 Chapter 13: ANOVA for 2-way classifications (2 of 2) Fixed and Random factors, Model I, Model II, and Model III (mixed model) ANOVA Chapter 14: Unreplicated Factorial & Nested Designs Slide 2 HW

More information

Modelling residential prices with cointegration techniques and automatic selection algorithms

Modelling residential prices with cointegration techniques and automatic selection algorithms Modelling residential prices with cointegration techniques and automatic selection algorithms Ramiro J. Rodríguez A presentation for ERES 2014 - PhD Sessions Bucharest, Rumania The opinions and analyses

More information

ACCIDENT MODIFICATION FACTORS FOR MEDIANS ON FREEWAYS AND MULTILANE RURAL HIGHWAYS IN TEXAS

ACCIDENT MODIFICATION FACTORS FOR MEDIANS ON FREEWAYS AND MULTILANE RURAL HIGHWAYS IN TEXAS Fitzpatrick, Lord, Park ACCIDENT MODIFICATION FACTORS FOR MEDIANS ON FREEWAYS AND MULTILANE RURAL HIGHWAYS IN TEXAS Kay Fitzpatrick Senior Research Engineer Texas Transportation Institute, 335 TAMU College

More information

Stat 5100 Handout #27 SAS: Variations on Ordinary Least Squares (LASSO and Elastic Net)

Stat 5100 Handout #27 SAS: Variations on Ordinary Least Squares (LASSO and Elastic Net) Stat 5100 Handout #27 SAS: Variations on Ordinary Least Squares (LASSO and Elastic Net) Example: (Baseball) This data set (from the SAS Help) contains salary (for 1987) and performance (1986 and some career)

More information

Robust Imputation of Missing Values in Compositional Data Using the R-Package robcompositions

Robust Imputation of Missing Values in Compositional Data Using the R-Package robcompositions Robust Imputation of Missing Values in Compositional Data Using the R-Package robcompositions Matthias Templ 1,2, Peter Filzmoser 1, Karel Hron 3 1 Department of Statistics and Probability Theory, TU WIEN,

More information

CPUE standardization of black marlin (Makaira indica) caught by Taiwanese large scale longline fishery in the Indian Ocean

CPUE standardization of black marlin (Makaira indica) caught by Taiwanese large scale longline fishery in the Indian Ocean CPUE standardization of black marlin (Makaira indica) caught by Taiwanese large scale longline fishery in the Indian Ocean Sheng-Ping Wang Department of Environmental Biology and Fisheries Science, National

More information