Running head: DATA ANALYSIS AND INTERPRETATION 1

Size: px
Start display at page:

Download "Running head: DATA ANALYSIS AND INTERPRETATION 1"

Transcription

1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma

2 DATA ANALYSIS AND INTERPRETATION 2 Owners of the various Major League Baseball (MLB) teams are interested in learning of ways to recruit, select, and retain their best players. Before we address these items of interest we need to know a little bit of information on the MLB. The MLB is a professional baseball league consisting of teams that play in the American and National leagues. The league is one of the major professional sports leagues of the United States and Canada. It is composed of 30 teams 29 are in the United States and one in Canada. The MLB has the highest season attendance of any sports league in There are approximately 1200 players in the league. We will use various pieces of data to assist the MLB team owners in their efforts to find ways to recruit, select, and retain their best baseball players. We will use descriptive analysis to help them understand the data used. There will be specific statistical tests ran, analysis, and interpretation of the results for Salary, Homeruns (HR), and Batting Average (AVG). This will be in an effort to see if there is a relationship between a players Salary and their homeruns, as well as their batting average or not. To begin with we will use the below information represented in Table 1. This table represents the variables and their types, as well as their measurement scale. This information is helpful in helping us determine just what we are able to do with the data. Take Team for example it is a qualitative variable which means it is non-numerical, which is more descriptive in nature. While on the other hand Homeruns (HR) is a quantitative variable and is numerical meaning it is countable or meaningful, as in having value. We can also denote Team as a cross-sectional data type, with a Nominal measurement scale. Cross-sectional simply means a recorded characteristic, and can be collected irrespective of time. Nominal data is the least sophisticated, basically not a lot you can do with it. Table 1 Variable Name Name (of Players) Team (Name of Team) Variable Type Qualitative Qualitative Data Type Crosssectional Crosssectional Measurement Scale Nominal Nominal Salary (in dollars) Quantitative Continuous Ratio Games Played (G) Quantitative Discrete Ratio Hits (H) Quantitative Discrete Ratio Homeruns (HR) Quantitative Discrete Ratio Runs Batted In (RBI) Quantitative Discrete Ratio Batting Average (AVG) Quantitative Continuous Ratio We will be focusing on three variables, Salary, Homeruns (HR), and Batting Average (AVG) as all represent quantitative data. The variables Salary and Batting Average (AVG) are representative of continuous data types, which means there lays an infinite value within an interval. On the flipside Homeruns (HR) are of the discrete data type, which means there is the same interval between variables, like 2-3, and 3-4, we don t earn half a homerun. The last item

3 DATA ANALYSIS AND INTERPRETATION 3 we will address is the measurement scale as they all have a Ratio scale. The ratio scale is the strongest measurement scale, with a true zero point, which means $0.00 dollars means no money. Ratio also is meaningful in mathematical calculations, of which we will be using to arrive at a conclusion and recommendation for the owners. Given a baseball data set containing a random sample of 254 players with their respective stats, we will investigate the linear relationship, if any, between baseball players performance and pay. Performance variables as discussed will be batting average (AVG) and homerun (HR). Below we have Table 2 representing the relative frequency of players and their MLB league affiliation. It reflects a relative distribution of the 254 sample players at 47.2% for the National & 53.0% for the American leagues. This is illustrated as well in Figure 1 for clarity. Table 2 League Sample # of Players Relative National American Total Figure 1 Percentage of the random sample of 254 players and their MLB League affiliation National 53% 47% American

4 DATA ANALYSIS AND INTERPRETATION 4 For the given a baseball data set containing a random sample of 254 players with their respective stats, we illustrate the mean, median, mode, skew, and standard deviation for Salary as represented in Figure 2 below. We will also illustrate the mean, standard deviation, and skew for AVG as represented in Figure 3 below. What we want the owners to take away from this illustration is the sample Mean represents the average and is subject to interference from the outliers at both ends of the spectrum. The Median on the other hand is less subjective to outliers and is more of a truer picture of the data statistic. The Mode represents the value that occurs most frequently within the data set of each variable, like Salary & AVG. When we discuss the Standard Deviation we are talking about the amount of dispersion from the central location, which represents the data points of the sample data. We must consider Skew as well as it reflects the data values relative to the Mean, the closer to zero they are the more evenly the distribution. Figure 2 Figure 3 Salary AVG Mean Mean Standard Error Standard Error Median Median Mode Mode 0.28 Standard Deviation Standard Deviation Sample Variance E+13 Sample Variance Kurtosis Kurtosis Skewness Skewness Range Range Minimum Minimum 0.19 Maximum Maximum Sum Sum Count 254 Count 254 Confidence Level(95.0%) Confidence Level(95.0%) For the given baseball data set containing a random sample of 254 players with their respective stats, we find the highest paid player is Mr. Jason Giambi with a salary of $23,428,571 and an average of Mr. Giambi s average falls at approximately the 76 th percentile in relation to the other players, so approximately 75% have a lower average and approximately 23% have a higher average with 3 other players sharing the same average of What we need to note here is the highest average is for this random sample of players and is owned by Mr. Ichiro Suzuki with a salary of $12,500,000. Using Figures 2 and 3 above we find the interval for the mean salary at $594, and for the mean of AVG is at the 95% confidence level for the population mean. To better illustrate the players salaries we have Table 3 reflecting the relative frequency and Figure 4 a relative frequency histogram below. In Table 3 we can see approximately 80% of the players earn less than $9,000,000. In Figure 4 the relative frequency histogram reflects a positively skewed or skewed to the right, distribution with a long tail extending to the right. This attribute reflects the presence of a small number of relatively large values.

5 Percentage of Players DATA ANALYSIS AND INTERPRETATION 5 Table 3 Class (in $1,000s) Relative Cumulative Cumulative Relative $350 - $3, $3,070 - $5, $5,790 - $8, $8,510 - $11, $11,230 - $13, $13,950 - $16, $16,670 - $19, $19,390 - $22, $22,110 - $24, Figure Salary of the Random Sample of 254 MLB Players Salary in ( $1,000s) For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for salary Figure 5 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players salaries from our random sample and the frequency distribution. It complements our histogram in Figure 4 above. From this

6 Percentage of Players DATA ANALYSIS AND INTERPRETATION 6 illustration we can see 45% of the players earn less than $5,000,000. It also illustrates most of the players earn less than $10,000,000, something to think about. Figure 5 Random Sample of 254 MLB Players Salary Salary in ($1,000s) To illustrate the players homeruns we have Table 4 reflecting the relative frequency and Figure 6 a relative frequency histogram below. In Table 4 we can see approximately 80% of the players hit less than 200 homeruns. In Figure 6 the relative frequency histogram reflects a positively skewed or skewed to the right, distribution with a long tail extending to the right. This attribute reflects the presence of a small number of relatively large values. Table 4 Number of Home Runs Cumulative Relative Relative Cumulative Interval 98 - Total

7 Percentage of Players Number of Players DATA ANALYSIS AND INTERPRETATION 7 Figure 6 Homeruns Number of Homeruns For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for Homeruns Figure 7 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players homeruns from our random sample and the frequency distribution. It complements our histogram in Figure 6 above. From this illustration we can see approximately 60% of the players hit less than 100 homeruns. It also illustrates most of the players hit less than approximately 250 homeruns, something to think about. Figure Random Sample of 254 MLB Players Homeruns Number of Homeruns

8 Percentage of Players DATA ANALYSIS AND INTERPRETATION 8 To illustrate the players average (AVG) we have Table 5 reflecting the relative frequency and Figure 8 a relative frequency histogram below. In Table 5 we can see approximately 82% of the players have an average of less than In Figure 8 the relative frequency histogram reflects a negatively skewed or skewed to the left, distribution with a long tail extending to the left. This attribute reflects the presence of a small number of relatively small values. Table 5 Batting Average (AVG) Relative Cumulative Cumulative Relative Interval Total Figure Batting Average Average For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for Batting average (AVG) Figure 9 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players batting average

9 Percentage of Players DATA ANALYSIS AND INTERPRETATION 9 from our random sample and the frequency distribution. It complements our histogram in Figure 8 above. From this illustration we can see approximately 40% of the players average is just below It illustrates most of the players are within the to range on batting average, something to think about. Figure 9 Random Sample of 254 MLB Players Batting Average (AVG) Average (AVG) Based on the raw stats data from the random sample of 254 MLB players and information presented and interpreted in written and graphical form for Salary, Homeruns (HR), and Average (AVG) we now have a pretty good idea of their independent characteristics as an independent variable. The question now is to examine whether or not a linear relationship exists between these variables. To do this we will need to set up the hypotheses test, whereby we will reject the null in favor of the alternative hypotheses if the test leads in that direction or fail to reject the null for the status quo. Below in written form is the stated null and alternative hypothesis for Salary & HR as well for Salary & AVG. Salary & HR Null: H 0 : There is no relationship. Alternative: H A : There is a relationship. Salary & AVG Null: H 0 : There is no relationship. Alternative: H A : There is a relationship.

10 DATA ANALYSIS AND INTERPRETATION 10 A simply way of comparing two variables is the scatter plot. These can be used to quickly see if there is a potential relationship between two variables as measured from the random sample mean for Salary, Homeruns (HR), and Average (AVG). This is provided as a precursor to the regression analysis coming up. Based on the dispersion of data points from the mean as represented by the trend line, it looks like there could be a linear relationship between Salary and the independent variables of HR & AVG as presented in Figure 10 and Figure 11 respectively. Figure 10 Salary to Homeruns H o m e r u n s $0 $5,000,000 $10,000,000 $15,000,000 $20,000,000 $25,000,000 Salary Figure 11 Salary to Average (AVG) 0.35 A v e r a g e $0 $5,000,000 $10,000,000 $15,000,000 $20,000,000 $25,000,000 Salary

11 Salary DATA ANALYSIS AND INTERPRETATION 11 We will continue with our testing by use regression and correlation analysis at the 95% confidence level. Below we have two graphs reflecting our regression test for Salary as a dependent variable also known as the response variable, and the respective independent variables of HR & AVG, a.k.a. explanatory variables. Figure 12 represents Salary and Homeruns (HR), and Figure 13 represents Salary and Batting average (AVG). Figure 12 HR Line Fit Plot $25,000,000 $20,000,000 y = 27943x + 1E+06 R² = $15,000,000 Salary $10,000,000 Predicted Salary Linear (Salary) $5,000,000 $ HR We note here the goodness-of-fit regression equations are located on both Fig. 10 and Fig. 11 in the upper right hand corner. For simplicity they have been listed below: Salary & HR Y = 27943x + 1E+06 or Y = 27943x Salary & AVG Y = 8E+07x - 2E+07 or Y = x

12 Salary DATA ANALYSIS AND INTERPRETATION 12 Figure 13 $25,000,000 $20,000,000 AVG Line Fit Plot y = 8E+07x - 2E+07 R² = 1 $15,000,000 Salary $10,000,000 $5,000,000 Predicted Salary $0 ($5,000,000) AVG Based on the regression summary, more specifically the ANOVA data we find the respective slope and intercept values. For simplicity they have been listed below: Salary & HR Slope: Intercept: Salary & AVG Slope: Intercept: We have ran a few more tests as reflected in Table 6, one being Covariance, which tells us the direction of the linear relationship between two variables. We cannot tell much from this test save there seems to be a positive linear relationship. The Correlation coefficient test is a better measure of direction and strength. Based on the data in Table 6 for Correlation of Salary to Homeruns of 0.72 it appears there is a strong positive linear relationship as a perfect relationship equals 1, where 0 represents no linear relationship. This brings us to Correlation of Salary to Batting Average (AVG) at 0.39, we find this to still be a positive linear relationship though much weaker than Homeruns. Using the R Squared also known as the coefficient of determination we can explain the percentage of variation of each of the pairs. We find for the

13 DATA ANALYSIS AND INTERPRETATION 13 Salary to Homeruns the model explains 52% leaving 48% to chance. The model for the Salary to Batting average explains 15% leaving 85% to chance. This may seem rather weak when spending millions of dollars; however there is still a linear relationship. To test this we use the P-Value test at the 95 % confidence level. As stated previously we will reject the null in favor of the alternative hypotheses if the test leads in that direction or fail to reject the null for the status quo. Table 6 Covariance between Salary & Home Runs Correlation between Salary & Home Runs Covariance between Salary & AVG Correlation between Salary & AVG P-Value Salary - HR E-06 P-Value Salary - AVG E-07 R Squared R Squared Salary - HR Salary - AVG Salary & HR We reject the null hypothesis since < 0.05 Salary & AVG We reject the null hypothesis since < 0.05 In conclusion given a p-value of , the null hypothesis can be rejected for Salary and the two independent variables Homeruns & AVG at 5% level of significance. Therefore the decision is to reject the null hypothesis in favor of the alternate hypothesis. Based on the R Squared it was proven by the model, with Homeruns coming out ahead of Batting average in strength, though still proving the linear relationship. The bottom line there is a linear relationship.

14 DATA ANALYSIS AND INTERPRETATION 14 In conclusion given a p-value of , the null hypothesis can be rejected for Salary and the two independent variables Homeruns & AVG at 5% level of significance. Therefore the decision is to reject the null hypothesis in favor of the alternate hypothesis. Based on the R Squared it was proven by the model, with Homeruns coming out ahead of Batting average in strength, though still proving the linear relationship. The bottom line there is a linear relationship. It is recommend the MLB Team Owners review this report, and ask any questions necessary if they need more clarification on the contents of this report. It appears to this analyst there may need to be a greater sample pulled which may include the entire population of MLB players as the stats are available. Recommend the owners consider that most of the bang for buck is below the $10,000,000 level and consider that most homeruns are earned below this level as well. While batting average does have a positive linear relationship to salary, it is not strong, there could be other variables to consider like, age, games played, and so on. It also appears most averages fall in the range, not much negotiating room there. Anything paid above $15,000,000 is not a good return on performance. To recruit new players, most are hungry just to get in the game, and the stats prove it as they want to prove they have what it takes to stay in the Major league. Most seem to have good averages and numerous homeruns at the lower level of cost to the owners. It would appear $5,000,000 or less is a good start for recruiting young talent. To select and keep good players there is plenty of negotiating room as far as salary goes between $5,000,000 and $15,000,000.

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced

More information

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Stats 2002: Probabilities for Wins and Losses of Online Gambling Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a

More information

Lab 11: Introduction to Linear Regression

Lab 11: Introduction to Linear Regression Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that

More information

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 2-2: Displaying Distributions with Graphs 8/31/06 Lecture 2-2 1 Recall Data: Individuals Variables Categorical variables

More information

STT 315 Section /19/2014

STT 315 Section /19/2014 Name: PID: A STT 315 Section 101 05/19/2014 Quiz 1A 50 minutes 1. A survey by an electric company contains questions on the following: Age of household head, Gender of household head and use of electric

More information

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:

More information

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics

Psychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics Psychology - Mr. Callaway/Mundy s Mill HS Unit 2.3 - Research Methods - Statistics How do psychologists ask & answer questions? Last time we asked that we were discussing Research Methods. This time we

More information

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5 Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with

More information

STANDARD SCORES AND THE NORMAL DISTRIBUTION

STANDARD SCORES AND THE NORMAL DISTRIBUTION STANDARD SCORES AND THE NORMAL DISTRIBUTION REVIEW 1.MEASURES OF CENTRAL TENDENCY A.MEAN B.MEDIAN C.MODE 2.MEASURES OF DISPERSIONS OR VARIABILITY A.RANGE B.DEVIATION FROM THE MEAN C.VARIANCE D.STANDARD

More information

save percentages? (Name) (University)

save percentages? (Name) (University) 1 IB Maths Essay: What is the correlation between the height of football players and their save percentages? (Name) (University) Table of Contents Raw Data for Analysis...3 Table 1: Raw Data...3 Rationale

More information

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Is lung capacity affected by smoking, sport, height or gender. Table of contents Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary

More information

Chapter 2: Modeling Distributions of Data

Chapter 2: Modeling Distributions of Data Chapter 2: Modeling Distributions of Data Section 2.1 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 2.2 Normal Distributions Section

More information

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Unit 5 Statistical Reasoning 1 5.1 Exploring Data Goals: Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Data: A set of values. A set of data can

More information

That pesky golf game and the dreaded stats class

That pesky golf game and the dreaded stats class That pesky golf game and the dreaded stats class Marsha Jance Indiana University East A case study that involves golf and statistics is presented. This case study focuses on descriptive statistics and

More information

A) The linear correlation is weak, and the two variables vary in the same direction.

A) The linear correlation is weak, and the two variables vary in the same direction. 1 Which of the following is NOT affected b outliers in a data set? A) Mean C) Range B) Mode D) Standard deviation 2 The following scatter plot represents a two-variable statistical distribution. Which

More information

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? fts6 Introduction : Basketball is a sport where the players have to be adept

More information

Frequency Distributions

Frequency Distributions Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

Lesson 3 Pre-Visit Teams & Players by the Numbers

Lesson 3 Pre-Visit Teams & Players by the Numbers Lesson 3 Pre-Visit Teams & Players by the Numbers Objective: Students will be able to: Review how to find the mean, median and mode of a data set. Calculate the standard deviation of a data set. Evaluate

More information

Chapter 12 Practice Test

Chapter 12 Practice Test Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value

More information

The pth percentile of a distribution is the value with p percent of the observations less than it.

The pth percentile of a distribution is the value with p percent of the observations less than it. Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition:

More information

Section I: Multiple Choice Select the best answer for each problem.

Section I: Multiple Choice Select the best answer for each problem. Inference for Linear Regression Review Section I: Multiple Choice Select the best answer for each problem. 1. Which of the following is NOT one of the conditions that must be satisfied in order to perform

More information

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010 ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted

More information

Descriptive Statistics. Dr. Tom Pierce Department of Psychology Radford University

Descriptive Statistics. Dr. Tom Pierce Department of Psychology Radford University Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Analysis of Variance. Copyright 2014 Pearson Education, Inc. Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually

More information

Driv e accu racy. Green s in regul ation

Driv e accu racy. Green s in regul ation LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Chapter 5: Methods and Philosophy of Statistical Process Control

Chapter 5: Methods and Philosophy of Statistical Process Control Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain

More information

Effect of homegrown players on professional sports teams

Effect of homegrown players on professional sports teams Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people

More information

Sample Final Exam MAT 128/SOC 251, Spring 2018

Sample Final Exam MAT 128/SOC 251, Spring 2018 Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv

More information

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data. 1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data. Green Blue Brown Blue Blue Brown Blue Blue Blue Green Blue Brown Blue Brown Brown Blue

More information

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2. Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2. Objectives Find rates of change and slopes. Relate a constant rate of change to the slope of

More information

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot U04 D02 Univariate Data Frequency Table Line Plot Box and Whisker Plot Univariate Data Bivariate Data involving a single variable does not deal with causes or relationships the major purpose of univariate

More information

Lesson 14: Modeling Relationships with a Line

Lesson 14: Modeling Relationships with a Line Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any

More information

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck? A Batting Average: Does It Represent Ability or Luck? Jim Albert Department of Mathematics and Statistics Bowling Green State University albert@bgnet.bgsu.edu ABSTRACT Recently Bickel and Stotz (2003)

More information

AP Statistics Midterm Exam 2 hours

AP Statistics Midterm Exam 2 hours AP Statistics Midterm Exam 2 hours Name Directions: Work on these sheets only. Read each question carefully and answer completely but concisely (point values are from 1 to 3 points so no written answer

More information

How are the values related to each other? Are there values that are General Education Statistics

How are the values related to each other? Are there values that are General Education Statistics How are the values related to each other? Are there values that are General Education Statistics far away from the others? Class Notes Measures of Position and Outliers: Z-scores, Percentiles, Quartiles,

More information

5.1 Introduction. Learning Objectives

5.1 Introduction. Learning Objectives Learning Objectives 5.1 Introduction Statistical Process Control (SPC): SPC is a powerful collection of problem-solving tools useful in achieving process stability and improving capability through the

More information

Lesson 2 Pre-Visit Slugging Percentage

Lesson 2 Pre-Visit Slugging Percentage Lesson 2 Pre-Visit Slugging Percentage Objective: Students will be able to: Set up and solve equations for batting average and slugging percentage. Review prior knowledge of conversion between fractions,

More information

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots

Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots AFM Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots Name Date To find the mean, median and mode from a histogram, you first need to know how many data points were used. Use the frequency

More information

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question: Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,

More information

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Solutionbank S1 Edexcel AS and A Level Modular Mathematics Page 1 of 1 Exercise A, Question 1 A group of thirty college students was asked how many DVDs they had in their collection. The results are as follows. 12 25 34 17 12 18 29 34 45 6 15 9 25 23 29 22 20

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the

More information

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007 Statistical Analysis of PGA Tour Skill Rankings 198-26 USGA Research and Test Center June 1, 27 1. Introduction The PGA Tour has recorded and published Tour Player performance statistics since 198. All

More information

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression

More information

NBA TEAM SYNERGY RESEARCH REPORT 1

NBA TEAM SYNERGY RESEARCH REPORT 1 NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis Note to CCSD HS Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6 (probability, which we will refer to as 6B), and then address Chapter 11 benchmarks (which will

More information

Organizing Quantitative Data

Organizing Quantitative Data Organizing Quantitative Data MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Objectives At the end of this lesson we will be able to: organize discrete data in

More information

Reminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm

Reminders. Homework scores will be up by tomorrow morning. Please  me and the TAs with any grading questions by tomorrow at 5pm Reminders Homework scores will be up by tomorrow morning Please email me and the TAs with any grading questions by tomorrow at 5pm 1 Chapter 12: Describing Distributions with Numbers Aaron Zimmerman STAT

More information

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i . Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with

More information

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia Efficiency Wages in Major League Baseball Starting Pitchers 1998-2001 Greg Madonia Statement of Problem Free agency has existed in Major League Baseball (MLB) since 1974. This is a mechanism that allows

More information

CHAPTER 1 ORGANIZATION OF DATA SETS

CHAPTER 1 ORGANIZATION OF DATA SETS CHAPTER 1 ORGANIZATION OF DATA SETS When you collect data, it comes to you in more or less a random fashion and unorganized. For example, what if you gave a 35 item test to a class of 50 students and collect

More information

Name May 3, 2007 Math Probability and Statistics

Name May 3, 2007 Math Probability and Statistics Name May 3, 2007 Math 341 - Probability and Statistics Long Exam IV Instructions: Please include all relevant work to get full credit. Encircle your final answers. 1. An article in Professional Geographer

More information

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?

Unit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30? Unit 3 - Data Grab a new packet from the chrome book cart 9/28 9/29 9/30? 10/3 10/4 10/5 10/6 10/7-10/10 10/11 10/12 10/13 Practice ACT #1 Lesson 1: Box and Whisker Plots I can find the 5 number summary

More information

Pitching Performance and Age

Pitching Performance and Age Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.

More information

Unit 6, Lesson 1: Organizing Data

Unit 6, Lesson 1: Organizing Data Unit 6, Lesson 1: Organizing Data 1. Here is data on the number of cases of whooping cough from 1939 to 1955. a. Make a new table that orders the data by year. year number of cases 1941 222,202 1950 120,718

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Descriptive Statistics vs Inferential Statistics Describing a sample Making inferences to a larger population Data = Information but too much information. How do we summarize data?

More information

Chapter 2 - Frequency Distributions and Graphs

Chapter 2 - Frequency Distributions and Graphs - Frequency Distributions and Graphs 1. Which of the following does not need to be done when constructing a frequency distribution? A) select the number of classes desired B) find the range C) make the

More information

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected

More information

Practice Test Unit 6B/11A/11B: Probability and Logic

Practice Test Unit 6B/11A/11B: Probability and Logic Note to CCSD Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6, and then address Chapter 11 benchmarks; logic concepts are also included. We have combined probability

More information

Was John Adams more consistent his Junior or Senior year of High School Wrestling?

Was John Adams more consistent his Junior or Senior year of High School Wrestling? Was John Adams more consistent his Junior or Senior year of High School Wrestling? An investigation into my Dad s high school Wrestling Career Amanda Adams Period 1 Statistical Reasoning in Sports December

More information

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday. Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7

More information

Algebra 1 Unit 6 Study Guide

Algebra 1 Unit 6 Study Guide Name: Period: Date: Use this data to answer questions #1. The grades for the last algebra test were: 12, 48, 55, 57, 60, 61, 65, 65, 68, 71, 74, 74, 74, 80, 81, 81, 87, 92, 93 1a. Find the 5 number summary

More information

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg 1 Introduction Relationship between Spring Constant and Length of Bungee Cord In this experiment, we aimed to model the behavior of the bungee cord that will be used in the Bungee Challenge. Specifically,

More information

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%? Econ 57 Gary Smith Fall 2011 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can

More information

Week 7 One-way ANOVA

Week 7 One-way ANOVA Week 7 One-way ANOVA Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method

More information

Fundamentals of Machine Learning for Predictive Data Analytics

Fundamentals of Machine Learning for Predictive Data Analytics Fundamentals of Machine Learning for Predictive Data Analytics Appendix A Descriptive Statistics and Data Visualization for Machine learning John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie

More information

Quantitative Literacy: Thinking Between the Lines

Quantitative Literacy: Thinking Between the Lines Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1 Chapter 6: Statistics Lesson Plan Data summary and presentation:

More information

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES M M Withers and C J Bester Department of Civil Engineering University of Stellenbosch, Private Bag X1, Matieland, 7602 ABSTRACT There has

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Age of Fans

Age of Fans Measures of Central Tendency SUGGESTED LEARNING STRATEGIES: Activating Prior Knowledge, Interactive Word Wall, Marking the Text, Summarize/Paraphrase/Retell, Think/Pair/Share Matthew is a student reporter

More information

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present)

Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Jonathan Tung University of California, Riverside tung.jonathanee@gmail.com Abstract In Major League Baseball, there

More information

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.

(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010. Math 155, Test 1, 18 October 2011 Name: Instructions. This is a closed-book test. You may use a calculator (but not a cell phone). Make sure all cell-phones are put away and that the ringer is off. Show

More information

Stats in Algebra, Oh My!

Stats in Algebra, Oh My! Stats in Algebra, Oh My! The Curtis Center s Mathematics and Teaching Conference March 7, 2015 Kyle Atkin Kern High School District kyle_atkin@kernhigh.org Standards for Mathematical Practice 1. Make sense

More information

Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4,

Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4, Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4, 2018 www.spcforexcel.com Author s Note: This document has been revised to include the latest

More information

Quantitative Methods for Economics Tutorial 6. Katherine Eyal

Quantitative Methods for Economics Tutorial 6. Katherine Eyal Quantitative Methods for Economics Tutorial 6 Katherine Eyal TUTORIAL 6 13 September 2010 ECO3021S Part A: Problems 1. (a) In 1857, the German statistician Ernst Engel formulated his famous law: Households

More information

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Kromrey & Rendina-Gobioff An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Jeffrey D. Kromrey Gianna Rendina-Gobioff University of South Florida The Type I error

More information

Salary correlations with batting performance

Salary correlations with batting performance Salary correlations with batting performance By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction Many teams pay very high prices to acquire the players needed to make

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Descriptive Stats. Review

Descriptive Stats. Review Descriptive Stats Review Categorical Data The Area Principal Distorts the data possibly making it harder to compare categories Everything should add up to 100% When we add up all of our categorical data,

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS

2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS 2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS Player Demand: $4.00 Million Club Offer: $3.30 Million Midpoint:

More information

Unit 3 ~ Data about us

Unit 3 ~ Data about us Unit 3 ~ Data about us Investigation 3: Data Sets & Displays I can construct, interpret, and compare data sets and displays. I can find, interpret, and compare measures of center and variation for data

More information

Using SAS/INSIGHT Software as an Exploratory Data Mining Platform Robin Way, SAS Institute Inc., Portland, OR

Using SAS/INSIGHT Software as an Exploratory Data Mining Platform Robin Way, SAS Institute Inc., Portland, OR Using SAS/INSIGHT Software as an Exploratory Data Mining Platform Robin Way, SAS Institute Inc., Portland, OR ABSTRACT Data mining has captured the hearts and minds of business analysts seeking a solution

More information

Best Practices in Mathematics Education STATISTICS MODULES

Best Practices in Mathematics Education STATISTICS MODULES Best Practices in Mathematics Education STATISTICS MODULES APEC Technical Assistance & Training Facility (APEC TATF) APEC Project HRD 01/2009A - 21 st Century Mathematics Education for All in the APEC

More information

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

Internet Technology Fundamentals. To use a passing score at the percentiles listed below: Internet Technology Fundamentals To use a passing score at the percentiles listed below: PASS candidates with this score or HIGHER: 2.90 High Scores Medium Scores Low Scores Percentile Rank Proficiency

More information

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Minimal influence of wind and tidal height on underwater noise in Haro Strait Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents

More information

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball Zachary Taylor 1 Haverford College Department of Economics Advisor: Dave Owens Spring 2016 Abstract: This study

More information

8th Grade. Data.

8th Grade. Data. 1 8th Grade Data 2015 11 20 www.njctl.org 2 Table of Contents click on the topic to go to that section Two Variable Data Line of Best Fit Determining the Prediction Equation Two Way Table Glossary Teacher

More information

9.3 Histograms and Box Plots

9.3 Histograms and Box Plots Name Class Date 9.3 Histograms and Box Plots Essential Question: How can you interpret and compare data sets using data displays? Explore Understanding Histograms Resource Locker A histogram is a bar graph

More information

STAT 115 : INTRO TO EXPERIMENTAL DESIGN. Science answers questions with experiments

STAT 115 : INTRO TO EXPERIMENTAL DESIGN. Science answers questions with experiments STAT 115 : INTRO TO EXPERIMENTAL DESIGN Science answers questions with experiments 1 DEFINE THE PROBLEM Begin by asking a question about your topic What is a good question for an experiment? One that is

More information

(per 100,000 residents) Cancer Deaths

(per 100,000 residents) Cancer Deaths Unit 3 Lesson 2 Investigation 2 Radioactive Waste Exposure Cancer Deaths (per 100,000 residents) 250 200 150 100 Name: 50 0 0 5 10 15 Index of Exposure a. Describe the direction and strength of the relationship.

More information

4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz

4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz 4-3 Rate of Change and Slope Warm Up Lesson Presentation Lesson Quiz Holt Algebra McDougal 1 Algebra 1 Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. x-int.: 10; y-int.: 4 Describe the correlation

More information

Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas

Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas [A revised version of the paper is published by the Journal of Quantitative Analysis in Sports,

More information

NUMB3RS Activity: Is It for Real? Episode: Hardball

NUMB3RS Activity: Is It for Real? Episode: Hardball Teacher Page 1 NUMB3RS Activity: Is It for Real? Topic: Data analysis Grade Level: 9-10 Objective: Use formulas to generate data points. Produce line graphs of which inferences are made. Time: 20 minutes

More information

PGA Tour Scores as a Gaussian Random Variable

PGA Tour Scores as a Gaussian Random Variable PGA Tour Scores as a Gaussian Random Variable Robert D. Grober Departments of Applied Physics and Physics Yale University, New Haven, CT 06520 Abstract In this paper it is demonstrated that the scoring

More information

ESP 178 Applied Research Methods. 2/26/16 Class Exercise: Quantitative Analysis

ESP 178 Applied Research Methods. 2/26/16 Class Exercise: Quantitative Analysis ESP 178 Applied Research Methods 2/26/16 Class Exercise: Quantitative Analysis Introduction: In summer 2006, my student Ted Buehler and I conducted a survey of residents in Davis and five other cities.

More information

March Madness Basketball Tournament

March Madness Basketball Tournament March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce

More information

Returns to Skill in Professional Golf: A Quantile Regression Approach

Returns to Skill in Professional Golf: A Quantile Regression Approach International Journal of Sport Finance, 2010, 5, 167-180, 2010 West Virginia University Returns to Skill in Professional Golf: A Quantile Regression Approach Leo H. Kahane 1 1 Providence College Leo H.

More information