ISyE 6414: Regression Analysis
|
|
- Joella McCormick
- 5 years ago
- Views:
Transcription
1 ISyE 6414: Regression Analysis Lectures: MWF 8:00-10:30, MRDC #2404 Early five-week session; May 14- June 15 (8:00-9:10; 10-min break; 9:20-10:30) Instructor: Dr. Yajun Mei ( YA_JUNE MAY ) ymei@isye.gatech.edu; Tel: (O) Office Hours: MWF 10:30-11:00, after class or Groseclose #343 Course Homepage: Canvas (all HWs due Canvas) backup: HW#1 due on Friday, May 18 for on-campus students, and on Wednesday, May 23 for distance learning students
2 My academic pathway Undergraduate: Math, Peking Univ., BS in 1996 Work as a computer programmer in a Chinese bank, Graduate: PhD in Math with a minor in EE, Caltech, (advisor: Dr. Gary Lorden) Post Doc in biostatistics: FHCRC, Seattle, Sep 2005 (supervisor: Dr. Sarah Holte) New Research Fellow: SAMSI & Duke Univ., Fall 2005 Joined ISyE of GT since Jan Currently a tenured associate professor.
3 About this course Regression Analysis is the key building block for many modern Machine Learning, Artificial Intelligent, Business Analytics techniques and methods (such as Neural Networks, Deep Learning, Boosting, Random Forrest, etc.) This course aims to help you Understand its theoretical aspects (HW#1, #2, #4, and a midterm) Understand its computational aspects (HW#3, and a course project) 3
4 Organization of the Course Textbooks (Notes/slides provided): Kutner, Nachtsheim, Neter and Li, Applied linear statistical models (fifth edition)., 5 th ed Faraway, Practical Regression and ANOVA using R (freely downloadable online) Topics: Simple Linear Regression (Ch 1-4) Multiple linear Regression (Ch 5-11) (2 weeks, Midterm) Advanced Regression (Ch 13-14) ( 2 weeks) Design of Experiments (Ch 13, 14) 4
5 Organization of the Course Grading Policy (the past AVG GPA is [3.7,3.9]): Class attendance (5%) Homework (4*10%=40%): Collaboration encouraged, but you cannot look at any other solutions before submitting. One in-class Midterm (25%): 9:15am-10:30am, Friday, May 25 (happy Memorial weekend ) Class project (30%): a team of 2-4 or by yourself. See the handout for possible topics of project. Proposal (1-3 pages) : May 30 (Wed) Presentation file: due 7am on June 13 (Wed) (only for on-campus students, not required for DL students) Final report: June 15 (Friday) [Only for the Distance Learning students: two-lectures delay for homeworks and class project proposal, and one-week delay for midterm, and the final report.] 5
6 Part A Basic Background on probability and statistics. We might not discuss this background part in details, but I listed some slides here, so that you can brush up your memory if necessary Three key Probability distributions: Binomial, Poisson, and Normal. 6
7 Probability Review See Appendix A of our text. Probability Discrete Random Variable Continuous Random Variables Joint Distribution 7
8 Probability Basics of Probability Theory Random Experiments, e.g., flip a fair coin three times, and observe Heads or Tails Sample spaces: the set of all possible outcomes, e.g., S={HHH,THH,HTH,HHT, HTT,THT,TTH,TTT} An Event: a subset of the sample space of a random experiment, e.g., observe one heads Union/Intersection/Complement of events; Counting Techniques; Axioms of Probability; Conditional Probability; Independence; Bayes Theorem 8
9 Random Variable A random variable is a function that assigns a real number to each outcome in the sample space of a random experiments. Example: Let X be the number of heads when flipping a fair coin three times. Rigorously, w HHH HHT HTH THH HTT THT TTH TTT X(w)
10 Discrete Random Variable X X with countable possible values Probability Mass function: Cumulative distribution function Mean: Variance: Standard Deviation 10
11 Important discrete RVs Discrete Uniform Binomial(n,p) Geometric(p) Poisson(\lambda) What are the mean and Var/SD? 11
12 Continuous Random Variable Probability density function: Cumulative distribution function Mean: Variance: Standard Deviation 12
13 Important Continuous RVs Gamma/Weibull/Lognormal/Beta distribution What are the mean and Var/SD? 13
14 Central Limit Theorem a. If X is Binomial(n,p), then ZZ = XX nnnn nnnn(1 pp) NN(0,1) (continuity correction) b. If X 1, X 2, Λ, X n are iid with mean µ and variance σ 2, then (or ZZ = XX XX nn nnnn nn σσ NN(00, 11) ) 14
15 Statistical Review Population parameter vs. Sample statistic Point Estimation Conference Interval Hypothesis Testing 15
16 Population Parameter vs Sample Statistic Population: a set of entities concerning which statistical inferences are to be drawn. Typically population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. Sample: a subset of observed objects from the populations. The sample represents a subset of manageable size (possibly massive). Parameter: a (typical unobservable) parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristics of a population or a model. Statistic: some measures of some attribute of a sample. It is calculated by applying a function to the values of the items comprising the sample. [Population parameter vs. Sample statistic] 16
17 Important Sample statistics Sample mean: Sample variance: Sample standard deviation: Sample range: r = max(x i ) min(x i ) Quartiles: The lower quartile: 25% of the data is less than q 1 The median: 50% of the data is less than q 2 The upper quartile: 75% of the data is less than q 3 As a measure of variability, the interquartile range (IQR) is defined as: IQR = q 3 q 1 Plots: Stem-and-Leaf Diagram/Plot, Histogram, Box Plots, Probability Plots (or Normal QQ plots) 17
18 Normal Distribution Assume X 1, X 2, Λ, X n are iid with normal distribution mean µ and variance σ 2 Sample mean XX NN μμ, σσ22 Sample variance SS 22 = XX ii XX 22 nn 11 SS 22 σσ χχ nn 11 nn nn 11. Or nn( XX μμ) σσ satisfies (Chi-square distribution) NN(00, 11) 18
19 Normal Distribution (Cont.) Assume X 1, X 2, Λ, X n are iid with normal distribution mean µ and variance σ 2 Sample mean XX is independent of sample variance SS 22 = XX ii XX 22. Moreover, nn( XX μμ) SS = nn 11 NN 00,11 χχ22 nn 11 /(nn 11) with df=n-1. [In many cases, θθ θθ ss.ee. θθ has a t-distribution often has t-distribution.] In Appendix B on page 1317, for t-distribution, critical point: tt αα,dddd = tt AA, ddff with AA = 11 αα so tt ,1111 = tt , 1111 =
20 Point Estimation The bias of the estimator θθ is BBBBBBBB θθ = EE θθ θθ. An estimator is unbiased if the bias is 0. The variance of the estimator θθ. The mean square error of the estimator θθ is MMMMMM θθ = EE θθ θθ 22 = VVVVVV θθ + BBBBBBBB θθ 22 The standard error of θθ is s.e.= VVVVVV( θθ) 20
21 Methods of Point Estimation There are three methodologies to create point estimates of a population parameter. A. Method of moments (MOM) B. Method of maximum likelihood (MLE) C. Bayesian estimation of parameters 21
22 MOM & MLE The method of moment (MOM) estimators are found by equating the population moment to the sample moments and solving the resulting equations, e.g., hh θθ = EE XX = XX = XX 11+ +XX nn nn The maximum likelihood estimator (MLE) is the value of θ that maximizes the likelihood function L(θ) = f(x 1 ) f(x 2 ) f(x n ) If the domain of f(x) does not depend on θ, dd llllllll(θθ) solving = 00 yields the MLE. ddθθ Otherwise, plot L(θ) and find the maximum.. 22
23 Confidence Interval & Hypothesis Testing One sample: 1. Normal mean with known variances (one-sided) 2. Normal mean with unknown variances 3. Normal variance 4. Proportion of Binomial Distribution Two samples: inference on mean difference 5. Two independent normal dist: variances known 6. Two independent normal dist: unknown and equal variances 7. Two independent normal distributions: unknown and unequal variances 8. Paired Samples 23
24 Part B Overview of Supervised Learning Simple Linear Regression 24
25 Overview of Supervised Learning Supervised Learning (directed data mining, learning with a teacher): The observed data is of the form of (YY ii, XX iiii,, XX iiii ) for ii = 11,, nn, where the variables can be split into two groups: independent variables (explanatory variables, inputs, predictors) XX = (XX 11,, XX pp ) and One (or more) dependent variable (output, responses) Y. The objective is to predict Y given values of the input X. 25
26 Supervised Learning Observed Data (Training Data): (YY ii, XX iiii,.., XX iiii ) for ii = 11,, nn Objective: find a function ff xx nnnnnn = ff(xx 11,, xx pp ) that can predict YY well for any given input xx nnnnnn = xx 11,, xx pp. Deterministic relationship?(many classification tasks in machine learning) 26
27 The Additive Error Model Key Statistical Ideas: Observed Data = True Value + Noise For the observed training data, YY ii = ff xx iiii,.., xx iiii + εε ii for ii = 11,, nn, where the errors εε ii ss are iid with mean 0 and are independent of XX ss. Find the function ff(xx 11,, xx pp ) or find its approximation!!! (Generative vs. Predictive models) The simplest case: when pp = 11, ff xx = ββ 00 + ββ 11 xx Simple linear regression: YY ii = ββ 00 + ββ 11 xx ii + εε ii 27
28 The first Main Topic Simple linear regression 28
29 Empirical Models: Regression Many engineering and scientific problems are concerned with determining a relationship between a set of variables. For example: Y= college GPA at 1 st year; X= high school GPA Or Y=Mortality rate; X= Immunization rate. Knowledge of such a relationship would enable us to predict the output for Y. Regression analysis is a statistical technique that is very useful for these types of problems, as it can be used to build a model to predict Y at a given X value. 29
30 Example: Immunized and Mortality Suppose one wants to investigate the relationship between the percentage of children who have been immunized against the infectious disease diphtheria, pertussis, and tetanus (DPT) in a given country and the corresponding mortality rate for children under five years of age in that country. The UN Children s Fund (UNICEF) considers the under-five mortality rate to be one of most important indicators of the level of well-being for children. 30
31 Data X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 Nation X Y Nation X Y Nation X Y Bolivia Ethiopia Mexico Brazil Finland 95 7 Poland Cambodia France 95 9 Russian Canada 85 8 Greece 54 9 Senegal China India Turkey Czech Republic Italy UK 90 9 Egypt Japan
32 Look at Scatter Plot The plot shows that Mortality rate tends to decrease as the percentage of children immunization increases. 32
33 Question X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 Question: Are Y and X related (associated), and how? Does better immunization improve mortality rate? Can we use the data to develop a model for predicting under-five mortality rate from the percentage of children immunized against DPT? 33
34 Linear Regression It is interesting both theoretically because of the elegance of the underlying theory, and from an applied point view, because of the wide variety of uses. Fit a models for a dependent variable as a function of one or more independent variables We will talk about Building models Assessing fit and reliability Drawing conclusions 34
35 A Simple Linear Regression We are interested in developing a linear equation that best summarizes the relationship in a sample between the response variable (Y) and the predictor variable (or independent variable) x YY ii = ββ 00 + ββ 11 xx ii + εε ii where the εε ii s are independent with mean 0 and variance σσ 22. The equation is also used to predict Y from X 35
36 (a) How to estimate ββ s Observe n data, YY ii, xx ii, and assume YY ii = ββ 00 + ββ 11 xx ii + εε ii where the εε ii s are independent with mean 0 and variance σσ 22. How to estimate ββ s? 36
37 Method of Least Squares The (ordinary) least squares estimator: Choose β 0 and β 1 to minimize the residual of sum square (RSS) 37
38 Why Least Squares? It is the Maximum Likelihood Estimators (MLE) of β 0 and β 1 when the errors εε ii s are iid N(0,σσ 22 ). It leads to the best linear unbiased estimators (BLUE) of β 0 and β 1, no matter whether the errors εε ii s are normally distributed or not. [A linear estimator is of the form nn ii=11 cc ii YY ii. The meaning of BLUE for β 1: Minimize vvvvvv cc ii YY ii = σσ 22 cc ii 22 subject to EE cc ii YY ii = cc ii ββ 00 + ββ 11 xx ii = ββ 11 for all β 0 and β 1, i.e., subject to cc ii ββ 00 = 00 and cc ii xx ii = 11] 38
39 Method of Least Squares When minimizing the residual of sum square (RSS) the solutions are: ββ 11 = SS xxxx SS xxxx, ββ 00 = yy ββ 11 xx where SS xxxx = xx ii xx 22 = xx ii 22 nn xx 22 39
40 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 Nation X Y Nation X Y Nation X Y Bolivia Ethiopia Mexico Brazil Finland 95 7 Poland Cambodia France 95 9 Russian Canada 85 8 Greece 54 9 Senegal China India Turkey Czech Republic Italy UK 90 9 Egypt Japan
41 Answer For our data nn = 2222, xx = , yy = 5555, xx ii 22 = , xx ii yy ii = SS xxxx = xx ii xx 22 = xx 22 ii nn xx 22 = SS xxyy = xx ii xx yy ii yy = xx ii yy ii nn xx yy = ββ 11 = SS xxxx = = ; SS xxxx ββ 00 = yy ββ 11 xx = = Thus, the fitted (simple linear regression) model is YY = xx + εε or EE YY = xx. 41
42 (b) Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Estimate the mean under-five mortality rate per 1000 live births when x=10? Repeat the question when x= 90? [ ; ] 42
43 (c) How to estimate σσ 22? Recall that the model is yy ii = ββ 00 + ββ 11 xx ii + εε ii where the εε ii s are iid with mean 0 and variance σσ 22 We got the estimator ββ 00, ββ 11, and how to estimate the third parameter, σσ 22? Answer: It is natural to use the observed fitting error ee ii = yy ii ( ββ 00 + ββ 11 xx ii ) and the residual sum of squares RRRRRR = nn ii=11 ee ii 22 The estimator of σ 2 is σσ 22 = RRRRRR [and nn 22 σσ22 χχ 22 nn 22 σσ 22 nn 22] In practice, it is easier to compute RSS as follows: nn RRRRRR = ii=11 ee ii 22 = SS yyyy ββ 11 SS xxxx = SS yyyy SS xxxx 22 SS xxxx 43
44 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 In our example, the fitted (simple linear regression) model is YY = xx + εε. Find an estimate of σσ 22 = vvvvvv εε. Two ways to calculate the residual sum of squares RSS: Calculate the observed fitting error (residual) ee ii = yy ii ( ββ 00 + ββ 11 xx ii ) and then RRRRRR = nn ii=11 ee 22 ii = Use Sxx = , Sxy=-22706, Syy=77498, and RRRRRR = SS yyyy ββ 11 SS xxxx = SS yyyy SS xxxx 22 = / = SS xxxx The estimator of σ 2 is σσ 22 = RRRRRR = nn 22 (or σσ = = ). 44
45 R code (calculator-type) x <- c(77, 69, 32, 85, 94, 99, 89, 13, 95, 95, 54, 89, 95, 87, 91, 98, 73, 47, 76, 90); y <- c(118, 65, 184, 8, 43, 12, 55, 208, 7, 9, 9, 124, 10, 6, 33, 16, 32, 145, 87, 9); Sxx <- sum( x * x) - length(x) * (mean(x))^2 Sxy <- sum(x *y ) - length(x) * mean(x) * mean(y) Syy <- sum( y * y) - length(y) * (mean(y))^2 beta1hat <- Sxy / Sxx beta0hat <- mean(y) - beta1hat * mean(x) ### Two ways to compute RSS error <- y - (beta0hat + beta1hat * x) RSS <- sum( error * error) ### Or RSS <- Syy Sxy^2 / Sxx sigma2hat <- RSS / (length(x) - 2) c(beta0hat, beta1hat, sigma2hat) 45
46 (d) Properties of OLS estimators To derive the statistical inference of the (ordinary) least squares ββ 11 and ββ 00, we need to find EE ββ ii VVVVVV ββ ii Then by the central limit theorem, asymptotically ββ ii EE ββ ii NN(00, 11) VVVVVV( ββ ii ) 46
47 Key Steps SS xxxx = xx ii xx 22 = xx ii 22 nn xx 22, SS xxxx = xx ii xx yy ii yy = xx ii yy ii nn xx yy Assumption: the xx ii s are constants, and the YY ii s are independent with EE(YY ii ) = ββ 00 + ββ 11 xx ii and VVVVVV(YY ii ) = σσ 22. ββ 11 = SS xxxx = nn SS ii=11 cc ii YY ii, where cc ii = xx ii xx xxxx SS xxxx following three properties: nn ii=11 cc ii = 00 nn ii=11 cc ii xx ii = 11 nn ii=11 cc ii 22 = 11 SS xxxx ββ 00 = yy ββ 11 xx = nn ii=11 ( 11 cc nn ii xx)yy ii satisfying the 47
48 (d) Properties of OLS Unbiased: Variance: where Note that they are correlated: 48
49 CI and Tests Since σ 2 is unknown, consider and thus Then and have t-distribution with n-2 degree of freedom. 49
50 (d1) Inference on ββ 11 When testing HH 00 : ββ 11 = 00 versus HH 11 : ββ the test statistic is TT oooooo = ββ 11 ssss( ββ 11 ) = ββ 11 σσ/ SSSSSS and we reject HH 00 if TT oobbbb tt αα/22,nn 22 A 11 αα confidence interval on ββ 11 is ββ 11 ± tt αα/22,nn 22 σσ SSSSSS 50
51 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Test HH 00 : ββ 11 = 00 versus HH 11 : ββ at αα = 555 level. [Recall SS xxxx = , σσ = , tt αα/22,nn 22 = tt ,1111 = TT oooooo = ββ 11 σσ/ SSSSSS = / = ] 51
52 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Find a 95% confidence interval on ββ 11. [Recall SS xxxx = , σσ = , tt αα/22,nn 22 = tt ,1111 = , So ββ 11 ± tt αα/22,nn 22 σσ SSSSSS = ± = , ] 52
53 (d2) Inference on ββ 00 When testing HH 00 : ββ 00 = bb 00 versus HH 11 : ββ 00 bb 00, the test statistic is TT oooooo = ββ 00 bb 00 ssss( ββ 00 ) = σσ ββ 00 bb xx 22 + nn SSxxxx and we reject HH 00 if TT oobbbb tt αα/22,nn 22 A 11 αα confidence interval on ββ 00 is ββ 00 ± tt αα/22,nn 22 σσ 11 nn + xx 22 SS xxxx 53
54 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Test HH 00 : ββ 00 = versus HH 11 : ββ at αα = 555 level. [Recall SS xxxx = , σσ = , tt αα/22,nn 22 = tt ,1111 = TT oooooo = σσ ββ 00 bb xx 22 + nn SSxxxx = ] 54
55 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Find a 95% confidence interval on ββ 00. [Recall SS xxxx = , σσ = , tt αα/22,nn 22 = tt ,1111 = , So ββ 00 ± tt αα/22,nn 22 σσ 11 nn + xx 22 SS xxxx = ± = [ , ].] 55
56 (d3) Inference on ββ 00 + ββ 11 xx nnnnnn For the simple linear regression model yy ii = ββ 00 + ββ 11 xx ii + εε ii For a given xx nnnnnn, what is the confidence interval for the mean response EE YY = ββ 00 + ββ 11 xx nnnnnn Point estimator: YY = ββ 00 + ββ 11 xx nnnnnn = nn ii=11 EE ββ 00 + ββ 11 xx nnnnnn = ββ 00 + ββ 11 xx nnnnnn 11 nn + cc ii xx nnnnnn xx YY ii VVVVVV ββ 00 + ββ 11 xx nnnnnn = σσ 22 [ 11 nn + xx nnnnnn xx 22 SS xxxx ] The 11 αα confidence interval on the mean response is ββ 00 + ββ 11 xx nnnnnn ± tt αα/22,nn 22 σσ 11 nn + xx nnnnnn xx 22 SS xxxx 56
57 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Find a 95% confidence interval on the mean under-five mortality rate when x=10 [Recall SS xxxx = , σσ = , xx = , tt ,1111 = nn + xx nnnnnn xx 22 YY ± tt αα/22,nn 22 σσ = ± = SS xxxx [ , ]] 57
58 (e) Prediction on new Observation For the simple linear regression model yy ii = ββ 00 + ββ 11 xx ii + εε ii How to predict future observation Y corresponding to a given xx nnnnnn? Point estimator: YY = ββ 00 + ββ 11 xx nnnnnn How about a confidence interval on Y? This is often called prediction interval. 58
59 Key Idea For the future response YY = ββ 00 + ββ 11 xx nnnnnn + εε ffffffffffff Consider the estimator YY = ββ 00 + ββ 11 xx nnnnnn, Then EE YY YY = 00 VVVVVV YY YY = VVVVVV ββ 00 + ββ 11 xx nnnnnn + εε ffffffffffff ββ 00 + ββ 11 xx nnnnnn = VVVVVV εε ffffffffffff + VVVVVV ββ 00 + ββ 11 xx nnnnnn = σσ 22 + σσ22 nn + xx nnnnnn xx 22 σσ SS xxxx 59
60 Key Idea (Cont.) For the future response yy = ββ 00 + ββ 11 xx nnnnnn + εε Consider the estimate YY = ββ 00 + ββ 11 xx nnnnnn, Then σσ So yy YY nn + xx nnnnnn xx 22 SSxxxx σσ yy YY nn + xx nnnnnn xx 22 SS xxxx NN(00, 11) TT nn 22 60
61 Prediction Interval For the simple linear regression model yy ii = ββ 00 + ββ 11 xx ii + εε ii How to predict future observation Y corresponding to a given xx nnnnnn? Point estimator: YY = ββ 00 + ββ 11 xx nnnnnn The 11 αα prediction interval is YY ± tt αα/22,nn 22 σσ nn + xx nnnnnn xx 22 SS xxxx 61
62 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Find a 95% prediction interval on Y when x=10 [Recall SS xxxx = , σσ = , xx = , tt ,1111 = YY ± tt αα/22,nn 22 σσ xx nnnnnn xx 22 = ± = nn SS xxxx [ , ]] 62
63 Example (Cont.) X = Percentage of children immunized against DPT; Y = under-five mortality rate per 1000 live births, in 1992 The fitted (simple linear regression) model is YY = xx + εε Find a 95% prediction interval on Y when x=90 [Recall SS xxxx = , σσ = , xx = , tt ,1111 = YY ± tt αα/22,nn 22 σσ xx nnnnnn xx 22 = ± = nn SS xxxx [ , ]] 63
64 Summary (I): point estimation Assume that we observe (xx ii, yy ii ) for i=1,..,n, and we consider the simple linear regression model yy ii = ββ 00 + ββ 11 xx ii + εε ii where the εε ii s are iid with mean 0 and variance σσ 22. Define SS xxxx = xx ii xx 22 = xx 22 ii nn xx 22, SS xxxx = xx ii xx yy ii yy = xx ii yy ii nn xx yy SS yyyy = yy ii yy 22 = yy 22 ii nn yy 22 The least squares estimators are ββ 11 = SS xxxx SS xxxx, ββ 00 = yy ββ 11 xx 64
65 Summary (II) : Estimation of σ 2 and Inference The estimator of σ 2 is σσ 22 = RRRRRR nn ii=11 nn 22 where RRRRRR = ee 22 ii and residuals ee ii = yy ii ββ 00 + ββ 11 xx ii. In practice, it is better to use nn RRRRRR = ii=11 ee ii 22 = SS yyyy ββ 11 SS xxxx = SS yyyy SS xxxx ββ 11 ββ 11 ssss( ββ 11 ) TT nn 22; ssss ββ 11 = σσ SS xxxx 22 SS xxxx ββ 00 ββ 00 ssss( ββ 00 ) TT nn 22 ; ssss ββ 00 = σσ 11 nn + xx 22 SS xxxx 65
66 Summary III: Inference At a given xx nnnnnn the point estimator of Y is YY = ββ 00 + ββ 11 xx nnnnnn A 11 αα confidence interval on the mean response Y is YY ± tt αα/22,nn 22 σσ 11 nn + xx nnnnnn xx 22 SS xxxx A 11 αα prediction interval on the future observation is YY ± tt αα/22,nn 22 σσ (appropriate for testing data) nn + xx nnnnnn xx 22 SS xxxx 66
67 Part C Introduction to R 67
68 What is R R is a system for statistical computation and graphics It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files Free software OS: Windows, Unix, Linux Homepage:
69 Installing R Under Windows Need Windows OS(32/64 bits) Go to any CRAN site (see mirrors.html for a list), and follow the instruction Download R for Windows R win.exe (Size: 54Mb), and double-click on the icon and follow the instructions to install
70 Data With R Objects: vector, factor, array, matrix, data.frame, ts, list Mode (numerical, character, complex, and logical); Length Read data stored in text (ASCII) files read.table(), scan(), and read.fwf() Saving data write(x, file= data.txt ), write.table() write in a file a data.frame Generating data
71 Linear Regression in R x <- c(77, 69, 32, 85, 94, 99, 89, 13, 95, 95, 54, 89, 95, 87, 91, 98, 73, 47, 76, 90); y <- c(118, 65, 184, 8, 43, 12, 55, 208, 7, 9, 9, 124, 10, 6, 33, 16, 32, 145, 87, 9); fm1 <- lm( y ~ x) fm1 Call: lm(formula = y ~ x) Coefficients: (Intercept) x
72 summary(fm1) > summary(fm1) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** x e-05 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 18 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 30.1 on 1 and 18 DF, p-value: 3.281e-05 72
73 Confidence Interval on coefficients > confint(fm1) 2.5 % 97.5 % (Intercept) x > confint(fm1, level = 0.99) 0.5 % 99.5 % (Intercept) x
74 Intervals for xnew > xnew <- data.frame(x = c(10, 90)) ## Confidence intervals on the mean response > predict(fm1, xnew, interval="confidence, level=0.95) fit lwr upr ## Prediction intervals for future observations > predict(fm1, xnew, interval="prediction, level=0.95) fit lwr upr
ISyE 6414 Regression Analysis
ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: Naïve Bayes Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationMidterm Exam 1, section 2. Thursday, September hour, 15 minutes
San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can
More informationThe Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected
More informationECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6
ECO 745: Theory of International Economics Jack Rossbach Fall 2015 - Lecture 6 Review We ve covered several models of trade, but the empirics have been mixed Difficulties identifying goods with a technological
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Announcements Course TA: Hao Xiong Office hours: Friday 2pm-4pm in ECSS2.104A1 First homework
More informationSpecial Topics: Data Science
Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics
More informationLecture 5. Optimisation. Regularisation
Lecture 5. Optimisation. Regularisation COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Iterative optimisation Loss functions Coordinate
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning
More informationLogistic Regression. Hongning Wang
Logistic Regression Hongning Wang CS@UVa Today s lecture Logistic regression model A discriminative classification model Two different perspectives to derive the model Parameter estimation CS@UVa CS 6501:
More informationAnnouncements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions
Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Linear Regression, Logistic Regression, and GLMs Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 About WWW2017 Conference 2 Turing Award Winner Sir Tim Berners-Lee 3
More informationMinimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation
Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Outline: MMSE estimation, Linear MMSE (LMMSE) estimation, Geometric formulation of LMMSE estimation and orthogonality principle. Reading:
More informationChapter 12 Practice Test
Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value
More informationSection I: Multiple Choice Select the best answer for each problem.
Inference for Linear Regression Review Section I: Multiple Choice Select the best answer for each problem. 1. Which of the following is NOT one of the conditions that must be satisfied in order to perform
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lectures: Stefanos Zafeiriou Goal (Lectures): To present modern statistical machine learning/pattern recognition algorithms. The course
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 3: Vector Data: Logistic Regression Instructor: Yizhou Sun yzsun@cs.ucla.edu October 9, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationA Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept
International Journal of Probability and Statistics 015, 4(): 4-50 DOI: 10.593/j.ijps.015040.0 A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept F. B. Adebola 1, N. A. Adegoke
More informationRunning head: DATA ANALYSIS AND INTERPRETATION 1
Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various
More informationNavigate to the golf data folder and make it your working directory. Load the data by typing
Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better
More informationName May 3, 2007 Math Probability and Statistics
Name May 3, 2007 Math 341 - Probability and Statistics Long Exam IV Instructions: Please include all relevant work to get full credit. Encircle your final answers. 1. An article in Professional Geographer
More informationNCSS Statistical Software
Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,
More informationAnalysis of Gini s Mean Difference for Randomized Block Design
American Journal of Mathematics and Statistics 2015, 5(3): 111-122 DOI: 10.5923/j.ajms.20150503.02 Analysis of Gini s Mean Difference for Randomized Block Design Elsayed A. H. Elamir Department of Statistics
More informationAnalysis of Variance. Copyright 2014 Pearson Education, Inc.
Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually
More informationUnit 4: Inference for numerical variables Lecture 3: ANOVA
Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June 10, 2013 Announcements Announcements Proposals due tomorrow. Will be returned to you by Wednesday. You MUST
More informationLab 11: Introduction to Linear Regression
Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that
More informationJasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)
Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) 1 HSR - University of Applied Sciences of Eastern Switzerland Institute
More informationAnnouncements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.
Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7
More informationStat 139 Homework 3 Solutions, Spring 2015
Stat 39 Homework 3 Solutions, Spring 05 Problem. Let i Nµ, σ ) for i,..., n, and j Nµ, σ ) for j,..., n. Also, assume that all observations are independent from each other. In Unit 4, we learned that the
More informationOperations on Radical Expressions; Rationalization of Denominators
0 RD. 1 2 2 2 2 2 2 2 Operations on Radical Expressions; Rationalization of Denominators Unlike operations on fractions or decimals, sums and differences of many radicals cannot be simplified. For instance,
More informationImperfectly Shared Randomness in Communication
Imperfectly Shared Randomness in Communication Madhu Sudan Harvard Joint work with Clément Canonne (Columbia), Venkatesan Guruswami (CMU) and Raghu Meka (UCLA). 11/16/2016 UofT: ISR in Communication 1
More informationNew Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary Variables
Journal of Mathematics and System Science 7 (017) 48-60 doi: 10.1765/159-591/017.09.00 D DAVID PUBLISHING New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary
More informationknn & Naïve Bayes Hongning Wang
knn & Naïve Bayes Hongning Wang CS@UVa Today s lecture Instance-based classifiers k nearest neighbors Non-parametric learning algorithm Model-based classifiers Naïve Bayes classifier A generative model
More informationMachine Learning Application in Aviation Safety
Machine Learning Application in Aviation Safety Surface Safety Metric MOR Classification Presented to: By: Date: ART Firdu Bati, PhD, FAA September, 2018 Agenda Surface Safety Metric (SSM) development
More informationAPPENDIX A COMPUTATIONALLY GENERATED RANDOM DIGITS 748 APPENDIX C CHI-SQUARE RIGHT-HAND TAIL PROBABILITIES 754
IV Appendices APPENDIX A COMPUTATIONALLY GENERATED RANDOM DIGITS 748 APPENDIX B RANDOM NUMBER TABLES 750 APPENDIX C CHI-SQUARE RIGHT-HAND TAIL PROBABILITIES 754 APPENDIX D LINEAR INTERPOLATION 755 APPENDIX
More informationy ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together
Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:
More informationUse of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling
International Journal of Statistics and Probability; Vol. 5, No. 3; May 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Use of Auxiliary Variables and Asymptotically
More informationCombining Experimental and Non-Experimental Design in Causal Inference
Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017 A Tribute to Don Design trumps
More informationLegendre et al Appendices and Supplements, p. 1
Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction
More informationDriv e accu racy. Green s in regul ation
LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based
More informationASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010
ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted
More informationWeek 7 One-way ANOVA
Week 7 One-way ANOVA Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method
More informationCommunication Amid Uncertainty
Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan
More informationData Set 7: Bioerosion by Parrotfish Background volume of bites The question:
Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,
More informationConfidence Interval Notes Calculating Confidence Intervals
Confidence Interval Notes Calculating Confidence Intervals Calculating One-Population Mean Confidence Intervals for Quantitative Data It is always best to use a computer program to make these calculations,
More informationLecture 16: Chapter 7, Section 2 Binomial Random Variables
Lecture 16: Chapter 7, Section 2 Binomial Random Variables!Definition!What if Events are Dependent?!Center, Spread, Shape of Counts, Proportions!Normal Approximation Cengage Learning Elementary Statistics:
More informationNovel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils
86 Pet.Sci.(29)6:86-9 DOI 1.17/s12182-9-16-x Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils Ehsan Khamehchi 1, Fariborz Rashidi
More informationTaking Your Class for a Walk, Randomly
Taking Your Class for a Walk, Randomly Daniel Kaplan Macalester College Oct. 27, 2009 Overview of the Activity You are going to turn your students into an ensemble of random walkers. They will start at
More informationOperational Risk Management: Preventive vs. Corrective Control
Operational Risk Management: Preventive vs. Corrective Control Yuqian Xu (UIUC) July 2018 Joint Work with Lingjiong Zhu and Michael Pinedo 1 Research Questions How to manage operational risk? How does
More informationOne-factor ANOVA by example
ANOVA One-factor ANOVA by example 2 One-factor ANOVA by visual inspection 3 4 One-factor ANOVA H 0 H 0 : µ 1 = µ 2 = µ 3 = H A : not all means are equal 5 One-factor ANOVA but why not t-tests t-tests?
More informationIntroduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More informationMinimal influence of wind and tidal height on underwater noise in Haro Strait
Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents
More informationComputation: One objective of this course is to introduce S-PLUS. Data files and files containing examples of S-PLUS and SAS code can be copied from t
STAT 511 Spring 2002 Course Information Instructor: Kenneth J. Koehler 120 Snedecor Hall Telephone: 515-294-4181 Fax: 515-294-5040 E-mail: kkoehler@iastate.edu Office Hours: to be announced Teaching Assistants:
More informationSample Final Exam MAT 128/SOC 251, Spring 2018
Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv
More informationCOMP Intro to Logic for Computer Scientists. Lecture 13
COMP 1002 Intro to Logic for Computer Scientists Lecture 13 B 5 2 J Admin stuff Assignments schedule? Split a2 and a3 in two (A2,3,4,5), 5% each. A2 due Feb 17 th. Midterm date? March 2 nd. No office hour
More informationCommunication Amid Uncertainty
Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan
More informationSupport Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016
Support Vector Machines: Optimization of Decision Making Christopher Katinas March 10, 2016 Overview Background of Support Vector Machines Segregation Functions/Problem Statement Methodology Training/Testing
More informationFunctions of Random Variables & Expectation, Mean and Variance
Functions of Random Variables & Expectation, Mean and Variance Kuan-Yu Chen ( 陳冠宇 ) @ TR-409, NTUST Functions of Random Variables 1 Given a random variables XX, one may generate other random variables
More informationBivariate Data. Frequency Table Line Plot Box and Whisker Plot
U04 D02 Univariate Data Frequency Table Line Plot Box and Whisker Plot Univariate Data Bivariate Data involving a single variable does not deal with causes or relationships the major purpose of univariate
More informationTOPIC 10: BASIC PROBABILITY AND THE HOT HAND
TOPIC 0: BASIC PROBABILITY AND THE HOT HAND The Hot Hand Debate Let s start with a basic question, much debated in sports circles: Does the Hot Hand really exist? A number of studies on this topic can
More informationYear 10 Term 2 Homework
Yimin Math Centre Year 10 Term 2 Homework Student Name: Grade: Date: Score: Table of contents 6 Year 10 Term 2 Week 6 Homework 1 6.1 Data analysis and evaluation............................... 1 6.1.1
More informationStats 2002: Probabilities for Wins and Losses of Online Gambling
Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a
More informationWhich On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?
Which On-Base Percentage Shows the Highest True Ability of a Baseball Player? January 31, 2018 Abstract This paper looks at the true on-base ability of a baseball player given their on-base percentage.
More informationChapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University
Chapter 20 Planning Accelerated Life Tests William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors
More informationBBS Fall Conference, 16 September Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study
BBS Fall Conference, 16 September 211 Use of modeling & simulation to support the design and analysis of a new dose and regimen finding study Didier Renard Background (1) Small molecule delivered by lung
More informationDeconstructing Data Science
Deconstructing Data Science David Bamman, UC Berkele Info 29 Lecture 4: Regression overview Jan 26, 217 Regression A mapping from input data (drawn from instance space ) to a point in R (R = the set of
More informationUnited States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,
SEDAR19-DW-14 United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic, 1993-2008 Kevin McCarthy and Neil Baertlein National Marine Fisheries Service,
More informationDiagnosis of Fuel Evaporative System
T S F S 0 6 L A B E X E R C I S E 2 Diagnosis of Fuel Evaporative System April 5, 2017 1 objective The objective with this laboratory exercise is to read, understand, and implement an algorithm described
More informationPre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary
Pre-Kindergarten 2017 Summer Packet Robert F Woodall Elementary In the fall, on your child s testing day, please bring this packet back for a special reward that will be awarded to your child for completion
More informationStatistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007
Statistical Analysis of PGA Tour Skill Rankings 198-26 USGA Research and Test Center June 1, 27 1. Introduction The PGA Tour has recorded and published Tour Player performance statistics since 198. All
More informationDistancei = BrandAi + 2 BrandBi + 3 BrandCi + i
. Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with
More informationIs lung capacity affected by smoking, sport, height or gender. Table of contents
Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary
More information1wsSMAM 319 Some Examples of Graphical Display of Data
1wsSMAM 319 Some Examples of Graphical Display of Data 1. Lands End employs numerous persons to take phone orders. Computers on which orders are entered also automatically collect data on phone activity.
More informationDevelopment of Decision Support Tools to Assess Pedestrian and Bicycle Safety: Development of Safety Performance Function
Development of Decision Support Tools to Assess Pedestrian and Bicycle Safety: Development of Safety Performance Function Valerian Kwigizile, Jun Oh, Ron Van Houten, & Keneth Kwayu INTRODUCTION 2 OVERVIEW
More informationThe Intrinsic Value of a Batted Ball Technical Details
The Intrinsic Value of a Batted Ball Technical Details Glenn Healey, EECS Department University of California, Irvine, CA 9617 Given a set of observed batted balls and their outcomes, we develop a method
More informationLecture 10. Support Vector Machines (cont.)
Lecture 10. Support Vector Machines (cont.) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Soft margin SVM Intuition and problem
More informationThe Reliability of Intrinsic Batted Ball Statistics Appendix
The Reliability of ntrinsic Batted Ball Statistics Appendix Glenn Healey, EECS Department University of California, rvine, CA 92617 Given information about batted balls for a set of players, we review
More informationFundamentals of Machine Learning for Predictive Data Analytics
Fundamentals of Machine Learning for Predictive Data Analytics Appendix A Descriptive Statistics and Data Visualization for Machine learning John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie
More informationQueue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *
Queue analysis for the toll station of the Öresund fixed link Pontus Matstoms * Abstract A new simulation model for queue and capacity analysis of a toll station is presented. The model and its software
More informationRemote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing
www.dlr.de Chart 1 > SESARInno > Fürstenau RTOFramerate> 2012-11-30 Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Quick review on Discrete Random Variables Néhémy Lim University of Washington Winter 2017 Example Pick 5 toppings from a total of 15. Give the sample space Ω of the experiment
More informationSan Francisco State University ECON 560 Summer Midterm Exam 2. Monday, July hour 15 minutes
San Francisco State University Michael Bar ECON 560 Summer 2018 Midterm Exam 2 Monday, July 30 1 hour 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. No calculators or electronic
More informationOn the association of inrun velocity and jumping width in ski. jumping
On the association of inrun velocity and jumping width in ski jumping Oliver Kuss Institute of Medical Epidemiology, Biostatistics, and Informatics University of Halle-Wittenberg, 06097 Halle (Saale),
More informationAttacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China
Attacking and defending neural networks HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Outline Background Attacking methods Defending methods 2 AI
More information100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?
100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men? The 100 Meter Dash has been an Olympic event since its very establishment in 1896(1928 for women). The reigning 100-meter Olympic champion
More informationAn Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables
Kromrey & Rendina-Gobioff An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Jeffrey D. Kromrey Gianna Rendina-Gobioff University of South Florida The Type I error
More informationDescriptive Statistics Project Is there a home field advantage in major league baseball?
Descriptive Statistics Project Is there a home field advantage in major league baseball? DUE at the start of class on date posted on website (in the first 5 minutes of class) There may be other due dates
More informationsave percentages? (Name) (University)
1 IB Maths Essay: What is the correlation between the height of football players and their save percentages? (Name) (University) Table of Contents Raw Data for Analysis...3 Table 1: Raw Data...3 Rationale
More informationEstimating Paratransit Demand Forecasting Models Using ACS Disability and Income Data
Estimating Paratransit Demand Forecasting Models Using ACS Disability and Income Data Presenter: Daniel Rodríguez Román University of Puerto Rico, Mayagüez Co-author: Sarah V. Hernandez University of Arkansas,
More informationChapter 5: Methods and Philosophy of Statistical Process Control
Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain
More informationBuilding an NFL performance metric
Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and
More informationPGA Tour Scores as a Gaussian Random Variable
PGA Tour Scores as a Gaussian Random Variable Robert D. Grober Departments of Applied Physics and Physics Yale University, New Haven, CT 06520 Abstract In this paper it is demonstrated that the scoring
More informationSTANDARD SCORES AND THE NORMAL DISTRIBUTION
STANDARD SCORES AND THE NORMAL DISTRIBUTION REVIEW 1.MEASURES OF CENTRAL TENDENCY A.MEAN B.MEDIAN C.MODE 2.MEASURES OF DISPERSIONS OR VARIABILITY A.RANGE B.DEVIATION FROM THE MEAN C.VARIANCE D.STANDARD
More informationJPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts
JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts Jan Kodovský, Jessica Fridrich May 16, 2012 / IH Conference 1 / 19 What is JPEG-compatibility steganalysis? Detects embedding
More informationA Novel Approach to Predicting the Results of NBA Matches
A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents
More informationThe Estimation of Winners Number of the Olympiads Final Stage
Olympiads in Informatics, 15, Vol. 9, 139 145 DOI: http://dx.doi.org/1.15388/ioi.15.11 139 The Estimation of Winners Number of the Olympiads Final Stage Aleksandr MAIATIN, Pavel MAVRIN, Vladimir PARFENOV,
More informationNonlife Actuarial Models. Chapter 7 Bühlmann Credibility
Nonlife Actuarial Models Chapter 7 Bühlmann Credibility Learning Objectives 1. Basic framework of Bühlmann credibility 2. Variance decomposition 3. Expected value of the process variance 4. Variance of
More information1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. True. False
STAT 350 (Spring 2016) Homework 9 Online 1 1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. 2. The two-sample Z test can be used only if both population variances
More informationLesson 14: Modeling Relationships with a Line
Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any
More informationReal-Time Electricity Pricing
Real-Time Electricity Pricing Xi Chen, Jonathan Hosking and Soumyadip Ghosh IBM Watson Research Center / Northwestern University Yorktown Heights, NY, USA X. Chen, J. Hosking & S. Ghosh (IBM) Real-Time
More information