Running head: DATA ANALYSIS AND INTERPRETATION 1
|
|
- Deirdre Porter
- 5 years ago
- Views:
Transcription
1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma
2 DATA ANALYSIS AND INTERPRETATION 2 Owners of the various Major League Baseball (MLB) teams are interested in learning of ways to recruit, select, and retain their best players. Before we address these items of interest we need to know a little bit of information on the MLB. The MLB is a professional baseball league consisting of teams that play in the American and National leagues. The league is one of the major professional sports leagues of the United States and Canada. It is composed of 30 teams 29 are in the United States and one in Canada. The MLB has the highest season attendance of any sports league in There are approximately 1200 players in the league. We will use various pieces of data to assist the MLB team owners in their efforts to find ways to recruit, select, and retain their best baseball players. We will use descriptive analysis to help them understand the data used. There will be specific statistical tests ran, analysis, and interpretation of the results for Salary, Homeruns (HR), and Batting Average (AVG). This will be in an effort to see if there is a relationship between a players Salary and their homeruns, as well as their batting average or not. To begin with we will use the below information represented in Table 1. This table represents the variables and their types, as well as their measurement scale. This information is helpful in helping us determine just what we are able to do with the data. Take Team for example it is a qualitative variable which means it is non-numerical, which is more descriptive in nature. While on the other hand Homeruns (HR) is a quantitative variable and is numerical meaning it is countable or meaningful, as in having value. We can also denote Team as a cross-sectional data type, with a Nominal measurement scale. Cross-sectional simply means a recorded characteristic, and can be collected irrespective of time. Nominal data is the least sophisticated, basically not a lot you can do with it. Table 1 Variable Name Name (of Players) Team (Name of Team) Variable Type Qualitative Qualitative Data Type Crosssectional Crosssectional Measurement Scale Nominal Nominal Salary (in dollars) Quantitative Continuous Ratio Games Played (G) Quantitative Discrete Ratio Hits (H) Quantitative Discrete Ratio Homeruns (HR) Quantitative Discrete Ratio Runs Batted In (RBI) Quantitative Discrete Ratio Batting Average (AVG) Quantitative Continuous Ratio We will be focusing on three variables, Salary, Homeruns (HR), and Batting Average (AVG) as all represent quantitative data. The variables Salary and Batting Average (AVG) are representative of continuous data types, which means there lays an infinite value within an interval. On the flipside Homeruns (HR) are of the discrete data type, which means there is the same interval between variables, like 2-3, and 3-4, we don t earn half a homerun. The last item
3 DATA ANALYSIS AND INTERPRETATION 3 we will address is the measurement scale as they all have a Ratio scale. The ratio scale is the strongest measurement scale, with a true zero point, which means $0.00 dollars means no money. Ratio also is meaningful in mathematical calculations, of which we will be using to arrive at a conclusion and recommendation for the owners. Given a baseball data set containing a random sample of 254 players with their respective stats, we will investigate the linear relationship, if any, between baseball players performance and pay. Performance variables as discussed will be batting average (AVG) and homerun (HR). Below we have Table 2 representing the relative frequency of players and their MLB league affiliation. It reflects a relative distribution of the 254 sample players at 47.2% for the National & 53.0% for the American leagues. This is illustrated as well in Figure 1 for clarity. Table 2 League Sample # of Players Relative National American Total Figure 1 Percentage of the random sample of 254 players and their MLB League affiliation National 53% 47% American
4 DATA ANALYSIS AND INTERPRETATION 4 For the given a baseball data set containing a random sample of 254 players with their respective stats, we illustrate the mean, median, mode, skew, and standard deviation for Salary as represented in Figure 2 below. We will also illustrate the mean, standard deviation, and skew for AVG as represented in Figure 3 below. What we want the owners to take away from this illustration is the sample Mean represents the average and is subject to interference from the outliers at both ends of the spectrum. The Median on the other hand is less subjective to outliers and is more of a truer picture of the data statistic. The Mode represents the value that occurs most frequently within the data set of each variable, like Salary & AVG. When we discuss the Standard Deviation we are talking about the amount of dispersion from the central location, which represents the data points of the sample data. We must consider Skew as well as it reflects the data values relative to the Mean, the closer to zero they are the more evenly the distribution. Figure 2 Figure 3 Salary AVG Mean Mean Standard Error Standard Error Median Median Mode Mode 0.28 Standard Deviation Standard Deviation Sample Variance E+13 Sample Variance Kurtosis Kurtosis Skewness Skewness Range Range Minimum Minimum 0.19 Maximum Maximum Sum Sum Count 254 Count 254 Confidence Level(95.0%) Confidence Level(95.0%) For the given baseball data set containing a random sample of 254 players with their respective stats, we find the highest paid player is Mr. Jason Giambi with a salary of $23,428,571 and an average of Mr. Giambi s average falls at approximately the 76 th percentile in relation to the other players, so approximately 75% have a lower average and approximately 23% have a higher average with 3 other players sharing the same average of What we need to note here is the highest average is for this random sample of players and is owned by Mr. Ichiro Suzuki with a salary of $12,500,000. Using Figures 2 and 3 above we find the interval for the mean salary at $594, and for the mean of AVG is at the 95% confidence level for the population mean. To better illustrate the players salaries we have Table 3 reflecting the relative frequency and Figure 4 a relative frequency histogram below. In Table 3 we can see approximately 80% of the players earn less than $9,000,000. In Figure 4 the relative frequency histogram reflects a positively skewed or skewed to the right, distribution with a long tail extending to the right. This attribute reflects the presence of a small number of relatively large values.
5 Percentage of Players DATA ANALYSIS AND INTERPRETATION 5 Table 3 Class (in $1,000s) Relative Cumulative Cumulative Relative $350 - $3, $3,070 - $5, $5,790 - $8, $8,510 - $11, $11,230 - $13, $13,950 - $16, $16,670 - $19, $19,390 - $22, $22,110 - $24, Figure Salary of the Random Sample of 254 MLB Players Salary in ( $1,000s) For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for salary Figure 5 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players salaries from our random sample and the frequency distribution. It complements our histogram in Figure 4 above. From this
6 Percentage of Players DATA ANALYSIS AND INTERPRETATION 6 illustration we can see 45% of the players earn less than $5,000,000. It also illustrates most of the players earn less than $10,000,000, something to think about. Figure 5 Random Sample of 254 MLB Players Salary Salary in ($1,000s) To illustrate the players homeruns we have Table 4 reflecting the relative frequency and Figure 6 a relative frequency histogram below. In Table 4 we can see approximately 80% of the players hit less than 200 homeruns. In Figure 6 the relative frequency histogram reflects a positively skewed or skewed to the right, distribution with a long tail extending to the right. This attribute reflects the presence of a small number of relatively large values. Table 4 Number of Home Runs Cumulative Relative Relative Cumulative Interval 98 - Total
7 Percentage of Players Number of Players DATA ANALYSIS AND INTERPRETATION 7 Figure 6 Homeruns Number of Homeruns For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for Homeruns Figure 7 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players homeruns from our random sample and the frequency distribution. It complements our histogram in Figure 6 above. From this illustration we can see approximately 60% of the players hit less than 100 homeruns. It also illustrates most of the players hit less than approximately 250 homeruns, something to think about. Figure Random Sample of 254 MLB Players Homeruns Number of Homeruns
8 Percentage of Players DATA ANALYSIS AND INTERPRETATION 8 To illustrate the players average (AVG) we have Table 5 reflecting the relative frequency and Figure 8 a relative frequency histogram below. In Table 5 we can see approximately 82% of the players have an average of less than In Figure 8 the relative frequency histogram reflects a negatively skewed or skewed to the left, distribution with a long tail extending to the left. This attribute reflects the presence of a small number of relatively small values. Table 5 Batting Average (AVG) Relative Cumulative Cumulative Relative Interval Total Figure Batting Average Average For clarity and to illustrate a possible different view for the owners to consider is the relative frequency polygon for Batting average (AVG) Figure 9 below. The polygon gives a general idea of the shape of the distribution using the midpoint of the players batting average
9 Percentage of Players DATA ANALYSIS AND INTERPRETATION 9 from our random sample and the frequency distribution. It complements our histogram in Figure 8 above. From this illustration we can see approximately 40% of the players average is just below It illustrates most of the players are within the to range on batting average, something to think about. Figure 9 Random Sample of 254 MLB Players Batting Average (AVG) Average (AVG) Based on the raw stats data from the random sample of 254 MLB players and information presented and interpreted in written and graphical form for Salary, Homeruns (HR), and Average (AVG) we now have a pretty good idea of their independent characteristics as an independent variable. The question now is to examine whether or not a linear relationship exists between these variables. To do this we will need to set up the hypotheses test, whereby we will reject the null in favor of the alternative hypotheses if the test leads in that direction or fail to reject the null for the status quo. Below in written form is the stated null and alternative hypothesis for Salary & HR as well for Salary & AVG. Salary & HR Null: H 0 : There is no relationship. Alternative: H A : There is a relationship. Salary & AVG Null: H 0 : There is no relationship. Alternative: H A : There is a relationship.
10 DATA ANALYSIS AND INTERPRETATION 10 A simply way of comparing two variables is the scatter plot. These can be used to quickly see if there is a potential relationship between two variables as measured from the random sample mean for Salary, Homeruns (HR), and Average (AVG). This is provided as a precursor to the regression analysis coming up. Based on the dispersion of data points from the mean as represented by the trend line, it looks like there could be a linear relationship between Salary and the independent variables of HR & AVG as presented in Figure 10 and Figure 11 respectively. Figure 10 Salary to Homeruns H o m e r u n s $0 $5,000,000 $10,000,000 $15,000,000 $20,000,000 $25,000,000 Salary Figure 11 Salary to Average (AVG) 0.35 A v e r a g e $0 $5,000,000 $10,000,000 $15,000,000 $20,000,000 $25,000,000 Salary
11 Salary DATA ANALYSIS AND INTERPRETATION 11 We will continue with our testing by use regression and correlation analysis at the 95% confidence level. Below we have two graphs reflecting our regression test for Salary as a dependent variable also known as the response variable, and the respective independent variables of HR & AVG, a.k.a. explanatory variables. Figure 12 represents Salary and Homeruns (HR), and Figure 13 represents Salary and Batting average (AVG). Figure 12 HR Line Fit Plot $25,000,000 $20,000,000 y = 27943x + 1E+06 R² = $15,000,000 Salary $10,000,000 Predicted Salary Linear (Salary) $5,000,000 $ HR We note here the goodness-of-fit regression equations are located on both Fig. 10 and Fig. 11 in the upper right hand corner. For simplicity they have been listed below: Salary & HR Y = 27943x + 1E+06 or Y = 27943x Salary & AVG Y = 8E+07x - 2E+07 or Y = x
12 Salary DATA ANALYSIS AND INTERPRETATION 12 Figure 13 $25,000,000 $20,000,000 AVG Line Fit Plot y = 8E+07x - 2E+07 R² = 1 $15,000,000 Salary $10,000,000 $5,000,000 Predicted Salary $0 ($5,000,000) AVG Based on the regression summary, more specifically the ANOVA data we find the respective slope and intercept values. For simplicity they have been listed below: Salary & HR Slope: Intercept: Salary & AVG Slope: Intercept: We have ran a few more tests as reflected in Table 6, one being Covariance, which tells us the direction of the linear relationship between two variables. We cannot tell much from this test save there seems to be a positive linear relationship. The Correlation coefficient test is a better measure of direction and strength. Based on the data in Table 6 for Correlation of Salary to Homeruns of 0.72 it appears there is a strong positive linear relationship as a perfect relationship equals 1, where 0 represents no linear relationship. This brings us to Correlation of Salary to Batting Average (AVG) at 0.39, we find this to still be a positive linear relationship though much weaker than Homeruns. Using the R Squared also known as the coefficient of determination we can explain the percentage of variation of each of the pairs. We find for the
13 DATA ANALYSIS AND INTERPRETATION 13 Salary to Homeruns the model explains 52% leaving 48% to chance. The model for the Salary to Batting average explains 15% leaving 85% to chance. This may seem rather weak when spending millions of dollars; however there is still a linear relationship. To test this we use the P-Value test at the 95 % confidence level. As stated previously we will reject the null in favor of the alternative hypotheses if the test leads in that direction or fail to reject the null for the status quo. Table 6 Covariance between Salary & Home Runs Correlation between Salary & Home Runs Covariance between Salary & AVG Correlation between Salary & AVG P-Value Salary - HR E-06 P-Value Salary - AVG E-07 R Squared R Squared Salary - HR Salary - AVG Salary & HR We reject the null hypothesis since < 0.05 Salary & AVG We reject the null hypothesis since < 0.05 In conclusion given a p-value of , the null hypothesis can be rejected for Salary and the two independent variables Homeruns & AVG at 5% level of significance. Therefore the decision is to reject the null hypothesis in favor of the alternate hypothesis. Based on the R Squared it was proven by the model, with Homeruns coming out ahead of Batting average in strength, though still proving the linear relationship. The bottom line there is a linear relationship.
14 DATA ANALYSIS AND INTERPRETATION 14 In conclusion given a p-value of , the null hypothesis can be rejected for Salary and the two independent variables Homeruns & AVG at 5% level of significance. Therefore the decision is to reject the null hypothesis in favor of the alternate hypothesis. Based on the R Squared it was proven by the model, with Homeruns coming out ahead of Batting average in strength, though still proving the linear relationship. The bottom line there is a linear relationship. It is recommend the MLB Team Owners review this report, and ask any questions necessary if they need more clarification on the contents of this report. It appears to this analyst there may need to be a greater sample pulled which may include the entire population of MLB players as the stats are available. Recommend the owners consider that most of the bang for buck is below the $10,000,000 level and consider that most homeruns are earned below this level as well. While batting average does have a positive linear relationship to salary, it is not strong, there could be other variables to consider like, age, games played, and so on. It also appears most averages fall in the range, not much negotiating room there. Anything paid above $15,000,000 is not a good return on performance. To recruit new players, most are hungry just to get in the game, and the stats prove it as they want to prove they have what it takes to stay in the Major league. Most seem to have good averages and numerous homeruns at the lower level of cost to the owners. It would appear $5,000,000 or less is a good start for recruiting young talent. To select and keep good players there is plenty of negotiating room as far as salary goes between $5,000,000 and $15,000,000.
Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions
Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced
More informationStats 2002: Probabilities for Wins and Losses of Online Gambling
Abstract: Jennifer Mateja Andrea Scisinger Lindsay Lacher Stats 2002: Probabilities for Wins and Losses of Online Gambling The objective of this experiment is to determine whether online gambling is a
More informationLab 11: Introduction to Linear Regression
Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that
More informationSTAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 2-2: Displaying Distributions with Graphs 8/31/06 Lecture 2-2 1 Recall Data: Individuals Variables Categorical variables
More informationSTT 315 Section /19/2014
Name: PID: A STT 315 Section 101 05/19/2014 Quiz 1A 50 minutes 1. A survey by an electric company contains questions on the following: Age of household head, Gender of household head and use of electric
More informationy ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together
Statistics 111 - Lecture 7 Exploring Data Numerical Summaries for Relationships between Variables Administrative Notes Homework 1 due in recitation: Friday, Feb. 5 Homework 2 now posted on course website:
More informationPsychology - Mr. Callaway/Mundy s Mill HS Unit Research Methods - Statistics
Psychology - Mr. Callaway/Mundy s Mill HS Unit 2.3 - Research Methods - Statistics How do psychologists ask & answer questions? Last time we asked that we were discussing Research Methods. This time we
More informationa) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5
Prof. C. M. Dalton ECN 209A Spring 2015 Practice Problems (After HW1, HW2, before HW3) CORRECTED VERSION Question 1. Draw and describe a relationship with heteroskedastic errors. Support your claim with
More informationSTANDARD SCORES AND THE NORMAL DISTRIBUTION
STANDARD SCORES AND THE NORMAL DISTRIBUTION REVIEW 1.MEASURES OF CENTRAL TENDENCY A.MEAN B.MEDIAN C.MODE 2.MEASURES OF DISPERSIONS OR VARIABILITY A.RANGE B.DEVIATION FROM THE MEAN C.VARIANCE D.STANDARD
More informationsave percentages? (Name) (University)
1 IB Maths Essay: What is the correlation between the height of football players and their save percentages? (Name) (University) Table of Contents Raw Data for Analysis...3 Table 1: Raw Data...3 Rationale
More informationIs lung capacity affected by smoking, sport, height or gender. Table of contents
Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary
More informationChapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data Section 2.1 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 2.2 Normal Distributions Section
More informationExploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion
Unit 5 Statistical Reasoning 1 5.1 Exploring Data Goals: Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Data: A set of values. A set of data can
More informationThat pesky golf game and the dreaded stats class
That pesky golf game and the dreaded stats class Marsha Jance Indiana University East A case study that involves golf and statistics is presented. This case study focuses on descriptive statistics and
More informationA) The linear correlation is weak, and the two variables vary in the same direction.
1 Which of the following is NOT affected b outliers in a data set? A) Mean C) Range B) Mode D) Standard deviation 2 The following scatter plot represents a two-variable statistical distribution. Which
More informationMath SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?
Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages? fts6 Introduction : Basketball is a sport where the players have to be adept
More informationFrequency Distributions
Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like
More informationLesson 3 Pre-Visit Teams & Players by the Numbers
Lesson 3 Pre-Visit Teams & Players by the Numbers Objective: Students will be able to: Review how to find the mean, median and mode of a data set. Calculate the standard deviation of a data set. Evaluate
More informationChapter 12 Practice Test
Chapter 12 Practice Test 1. Which of the following is not one of the conditions that must be satisfied in order to perform inference about the slope of a least-squares regression line? (a) For each value
More informationThe pth percentile of a distribution is the value with p percent of the observations less than it.
Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition:
More informationSection I: Multiple Choice Select the best answer for each problem.
Inference for Linear Regression Review Section I: Multiple Choice Select the best answer for each problem. 1. Which of the following is NOT one of the conditions that must be satisfied in order to perform
More informationASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010
ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era by Gary Evans Stat 201B Winter, 2010 Introduction: After a playerʼs strike in 1994 which resulted
More informationDescriptive Statistics. Dr. Tom Pierce Department of Psychology Radford University
Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like
More informationAnalysis of Variance. Copyright 2014 Pearson Education, Inc.
Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually
More informationDriv e accu racy. Green s in regul ation
LEARNING ACTIVITIES FOR PART II COMPILED Statistical and Measurement Concepts We are providing a database from selected characteristics of golfers on the PGA Tour. Data are for 3 of the players, based
More informationLegendre et al Appendices and Supplements, p. 1
Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction
More informationChapter 5: Methods and Philosophy of Statistical Process Control
Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain
More informationEffect of homegrown players on professional sports teams
Effect of homegrown players on professional sports teams ISYE 2028 Rahul Patel 902949215 Problem Description: Football is commonly referred to as America s favorite pastime. However, for thousands of people
More informationSample Final Exam MAT 128/SOC 251, Spring 2018
Sample Final Exam MAT 128/SOC 251, Spring 2018 Name: Each question is worth 10 points. You are allowed one 8 1/2 x 11 sheet of paper with hand-written notes on both sides. 1. The CSV file citieshistpop.csv
More information1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.
1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data. Green Blue Brown Blue Blue Brown Blue Blue Blue Green Blue Brown Blue Brown Brown Blue
More information4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.
Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2. Objectives Find rates of change and slopes. Relate a constant rate of change to the slope of
More informationBivariate Data. Frequency Table Line Plot Box and Whisker Plot
U04 D02 Univariate Data Frequency Table Line Plot Box and Whisker Plot Univariate Data Bivariate Data involving a single variable does not deal with causes or relationships the major purpose of univariate
More informationLesson 14: Modeling Relationships with a Line
Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any
More informationDraft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?
A Batting Average: Does It Represent Ability or Luck? Jim Albert Department of Mathematics and Statistics Bowling Green State University albert@bgnet.bgsu.edu ABSTRACT Recently Bickel and Stotz (2003)
More informationAP Statistics Midterm Exam 2 hours
AP Statistics Midterm Exam 2 hours Name Directions: Work on these sheets only. Read each question carefully and answer completely but concisely (point values are from 1 to 3 points so no written answer
More informationHow are the values related to each other? Are there values that are General Education Statistics
How are the values related to each other? Are there values that are General Education Statistics far away from the others? Class Notes Measures of Position and Outliers: Z-scores, Percentiles, Quartiles,
More information5.1 Introduction. Learning Objectives
Learning Objectives 5.1 Introduction Statistical Process Control (SPC): SPC is a powerful collection of problem-solving tools useful in achieving process stability and improving capability through the
More informationLesson 2 Pre-Visit Slugging Percentage
Lesson 2 Pre-Visit Slugging Percentage Objective: Students will be able to: Set up and solve equations for batting average and slugging percentage. Review prior knowledge of conversion between fractions,
More informationUnit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots
AFM Unit 6 Day 2 Notes Central Tendency from a Histogram; Box Plots Name Date To find the mean, median and mode from a histogram, you first need to know how many data points were used. Use the frequency
More informationData Set 7: Bioerosion by Parrotfish Background volume of bites The question:
Data Set 7: Bioerosion by Parrotfish Background Bioerosion of coral reefs results from animals taking bites out of the calcium-carbonate skeleton of the reef. Parrotfishes are major bioerosion agents,
More informationSolutionbank S1 Edexcel AS and A Level Modular Mathematics
Page 1 of 1 Exercise A, Question 1 A group of thirty college students was asked how many DVDs they had in their collection. The results are as follows. 12 25 34 17 12 18 29 34 45 6 15 9 25 23 29 22 20
More informationPitching Performance and Age
Pitching Performance and Age By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction April 13, 2016 Many of the oldest players and players with the most longevity of the
More informationStatistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007
Statistical Analysis of PGA Tour Skill Rankings 198-26 USGA Research and Test Center June 1, 27 1. Introduction The PGA Tour has recorded and published Tour Player performance statistics since 198. All
More informationLecture 22: Multiple Regression (Ordinary Least Squares -- OLS)
Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression
More informationNBA TEAM SYNERGY RESEARCH REPORT 1
NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH
More informationBuilding an NFL performance metric
Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and
More informationPractice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis
Note to CCSD HS Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6 (probability, which we will refer to as 6B), and then address Chapter 11 benchmarks (which will
More informationOrganizing Quantitative Data
Organizing Quantitative Data MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018 Objectives At the end of this lesson we will be able to: organize discrete data in
More informationReminders. Homework scores will be up by tomorrow morning. Please me and the TAs with any grading questions by tomorrow at 5pm
Reminders Homework scores will be up by tomorrow morning Please email me and the TAs with any grading questions by tomorrow at 5pm 1 Chapter 12: Describing Distributions with Numbers Aaron Zimmerman STAT
More informationDistancei = BrandAi + 2 BrandBi + 3 BrandCi + i
. Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with
More informationEfficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia
Efficiency Wages in Major League Baseball Starting Pitchers 1998-2001 Greg Madonia Statement of Problem Free agency has existed in Major League Baseball (MLB) since 1974. This is a mechanism that allows
More informationCHAPTER 1 ORGANIZATION OF DATA SETS
CHAPTER 1 ORGANIZATION OF DATA SETS When you collect data, it comes to you in more or less a random fashion and unorganized. For example, what if you gave a 35 item test to a class of 50 students and collect
More informationName May 3, 2007 Math Probability and Statistics
Name May 3, 2007 Math 341 - Probability and Statistics Long Exam IV Instructions: Please include all relevant work to get full credit. Encircle your final answers. 1. An article in Professional Geographer
More informationUnit 3 - Data. Grab a new packet from the chrome book cart. Unit 3 Day 1 PLUS Box and Whisker Plots.notebook September 28, /28 9/29 9/30?
Unit 3 - Data Grab a new packet from the chrome book cart 9/28 9/29 9/30? 10/3 10/4 10/5 10/6 10/7-10/10 10/11 10/12 10/13 Practice ACT #1 Lesson 1: Box and Whisker Plots I can find the 5 number summary
More informationPitching Performance and Age
Pitching Performance and Age Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector and Will Kunin Introduction April 13, 2016 Many of the oldest and most long- term players of the game are pitchers.
More informationUnit 6, Lesson 1: Organizing Data
Unit 6, Lesson 1: Organizing Data 1. Here is data on the number of cases of whooping cough from 1939 to 1955. a. Make a new table that orders the data by year. year number of cases 1941 222,202 1950 120,718
More informationDescriptive Statistics
Descriptive Statistics Descriptive Statistics vs Inferential Statistics Describing a sample Making inferences to a larger population Data = Information but too much information. How do we summarize data?
More informationChapter 2 - Frequency Distributions and Graphs
- Frequency Distributions and Graphs 1. Which of the following does not need to be done when constructing a frequency distribution? A) select the number of classes desired B) find the range C) make the
More informationThe Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected
More informationPractice Test Unit 6B/11A/11B: Probability and Logic
Note to CCSD Pre-Algebra Teachers: 3 rd quarter benchmarks begin with the last 2 sections of Chapter 6, and then address Chapter 11 benchmarks; logic concepts are also included. We have combined probability
More informationWas John Adams more consistent his Junior or Senior year of High School Wrestling?
Was John Adams more consistent his Junior or Senior year of High School Wrestling? An investigation into my Dad s high school Wrestling Career Amanda Adams Period 1 Statistical Reasoning in Sports December
More informationAnnouncements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.
Announcements Announcements UNIT 7: MULTIPLE LINEAR REGRESSION LECTURE 1: INTRODUCTION TO MLR STATISTICS 101 Problem Set 10 Due Wednesday Nicole Dalzell June 15, 2015 Statistics 101 (Nicole Dalzell) U7
More informationAlgebra 1 Unit 6 Study Guide
Name: Period: Date: Use this data to answer questions #1. The grades for the last algebra test were: 12, 48, 55, 57, 60, 61, 65, 65, 68, 71, 74, 74, 74, 80, 81, 81, 87, 92, 93 1a. Find the 5 number summary
More informationEquation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg
1 Introduction Relationship between Spring Constant and Length of Bungee Cord In this experiment, we aimed to model the behavior of the bungee cord that will be used in the Bungee Challenge. Specifically,
More informationIntroduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More information1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?
Econ 57 Gary Smith Fall 2011 Final Examination (150 minutes) No calculators allowed. Just set up your answers, for example, P = 49/52. BE SURE TO EXPLAIN YOUR REASONING. If you want extra time, you can
More informationWeek 7 One-way ANOVA
Week 7 One-way ANOVA Objectives By the end of this lecture, you should be able to: Understand the shortcomings of comparing multiple means as pairs of hypotheses. Understand the steps of the ANOVA method
More informationFundamentals of Machine Learning for Predictive Data Analytics
Fundamentals of Machine Learning for Predictive Data Analytics Appendix A Descriptive Statistics and Data Visualization for Machine learning John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie
More informationQuantitative Literacy: Thinking Between the Lines
Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1 Chapter 6: Statistics Lesson Plan Data summary and presentation:
More informationTRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES
TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES M M Withers and C J Bester Department of Civil Engineering University of Stellenbosch, Private Bag X1, Matieland, 7602 ABSTRACT There has
More informationNavigate to the golf data folder and make it your working directory. Load the data by typing
Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better
More informationAge of Fans
Measures of Central Tendency SUGGESTED LEARNING STRATEGIES: Activating Prior Knowledge, Interactive Word Wall, Marking the Text, Summarize/Paraphrase/Retell, Think/Pair/Share Matthew is a student reporter
More informationMajor League Baseball Offensive Production in the Designated Hitter Era (1973 Present)
Major League Baseball Offensive Production in the Designated Hitter Era (1973 Present) Jonathan Tung University of California, Riverside tung.jonathanee@gmail.com Abstract In Major League Baseball, there
More information(c) The hospital decided to collect the data from the first 50 patients admitted on July 4, 2010.
Math 155, Test 1, 18 October 2011 Name: Instructions. This is a closed-book test. You may use a calculator (but not a cell phone). Make sure all cell-phones are put away and that the ringer is off. Show
More informationStats in Algebra, Oh My!
Stats in Algebra, Oh My! The Curtis Center s Mathematics and Teaching Conference March 7, 2015 Kyle Atkin Kern High School District kyle_atkin@kernhigh.org Standards for Mathematical Practice 1. Make sense
More informationAnalysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4,
Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4, 2018 www.spcforexcel.com Author s Note: This document has been revised to include the latest
More informationQuantitative Methods for Economics Tutorial 6. Katherine Eyal
Quantitative Methods for Economics Tutorial 6 Katherine Eyal TUTORIAL 6 13 September 2010 ECO3021S Part A: Problems 1. (a) In 1857, the German statistician Ernst Engel formulated his famous law: Households
More informationAn Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables
Kromrey & Rendina-Gobioff An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Jeffrey D. Kromrey Gianna Rendina-Gobioff University of South Florida The Type I error
More informationSalary correlations with batting performance
Salary correlations with batting performance By: Jaime Craig, Avery Heilbron, Kasey Kirschner, Luke Rector, Will Kunin Introduction Many teams pay very high prices to acquire the players needed to make
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationDescriptive Stats. Review
Descriptive Stats Review Categorical Data The Area Principal Distorts the data possibly making it harder to compare categories Everything should add up to 100% When we add up all of our categorical data,
More informationMarch Madness Basketball Tournament
March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce
More information2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS
2014 NATIONAL BASEBALL ARBITRATION COMPETITION ERIC HOSMER V. KANSAS CITY ROYALS (MLB) SUBMISSION ON BEHALF OF THE CLUB KANSAS CITY ROYALS Player Demand: $4.00 Million Club Offer: $3.30 Million Midpoint:
More informationUnit 3 ~ Data about us
Unit 3 ~ Data about us Investigation 3: Data Sets & Displays I can construct, interpret, and compare data sets and displays. I can find, interpret, and compare measures of center and variation for data
More informationUsing SAS/INSIGHT Software as an Exploratory Data Mining Platform Robin Way, SAS Institute Inc., Portland, OR
Using SAS/INSIGHT Software as an Exploratory Data Mining Platform Robin Way, SAS Institute Inc., Portland, OR ABSTRACT Data mining has captured the hearts and minds of business analysts seeking a solution
More informationBest Practices in Mathematics Education STATISTICS MODULES
Best Practices in Mathematics Education STATISTICS MODULES APEC Technical Assistance & Training Facility (APEC TATF) APEC Project HRD 01/2009A - 21 st Century Mathematics Education for All in the APEC
More informationInternet Technology Fundamentals. To use a passing score at the percentiles listed below:
Internet Technology Fundamentals To use a passing score at the percentiles listed below: PASS candidates with this score or HIGHER: 2.90 High Scores Medium Scores Low Scores Percentile Rank Proficiency
More informationMinimal influence of wind and tidal height on underwater noise in Haro Strait
Minimal influence of wind and tidal height on underwater noise in Haro Strait Introduction Scott Veirs, Beam Reach Val Veirs, Colorado College December 2, 2007 Assessing the effect of wind and currents
More informationAn Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball
An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball Zachary Taylor 1 Haverford College Department of Economics Advisor: Dave Owens Spring 2016 Abstract: This study
More information8th Grade. Data.
1 8th Grade Data 2015 11 20 www.njctl.org 2 Table of Contents click on the topic to go to that section Two Variable Data Line of Best Fit Determining the Prediction Equation Two Way Table Glossary Teacher
More information9.3 Histograms and Box Plots
Name Class Date 9.3 Histograms and Box Plots Essential Question: How can you interpret and compare data sets using data displays? Explore Understanding Histograms Resource Locker A histogram is a bar graph
More informationSTAT 115 : INTRO TO EXPERIMENTAL DESIGN. Science answers questions with experiments
STAT 115 : INTRO TO EXPERIMENTAL DESIGN Science answers questions with experiments 1 DEFINE THE PROBLEM Begin by asking a question about your topic What is a good question for an experiment? One that is
More information(per 100,000 residents) Cancer Deaths
Unit 3 Lesson 2 Investigation 2 Radioactive Waste Exposure Cancer Deaths (per 100,000 residents) 250 200 150 100 Name: 50 0 0 5 10 15 Index of Exposure a. Describe the direction and strength of the relationship.
More information4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz
4-3 Rate of Change and Slope Warm Up Lesson Presentation Lesson Quiz Holt Algebra McDougal 1 Algebra 1 Warm Up 1. Find the x- and y-intercepts of 2x 5y = 20. x-int.: 10; y-int.: 4 Describe the correlation
More informationGizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas
Gizachew Tiruneh, Ph. D., Department of Political Science, University of Central Arkansas, Conway, Arkansas [A revised version of the paper is published by the Journal of Quantitative Analysis in Sports,
More informationNUMB3RS Activity: Is It for Real? Episode: Hardball
Teacher Page 1 NUMB3RS Activity: Is It for Real? Topic: Data analysis Grade Level: 9-10 Objective: Use formulas to generate data points. Produce line graphs of which inferences are made. Time: 20 minutes
More informationPGA Tour Scores as a Gaussian Random Variable
PGA Tour Scores as a Gaussian Random Variable Robert D. Grober Departments of Applied Physics and Physics Yale University, New Haven, CT 06520 Abstract In this paper it is demonstrated that the scoring
More informationESP 178 Applied Research Methods. 2/26/16 Class Exercise: Quantitative Analysis
ESP 178 Applied Research Methods 2/26/16 Class Exercise: Quantitative Analysis Introduction: In summer 2006, my student Ted Buehler and I conducted a survey of residents in Davis and five other cities.
More informationMarch Madness Basketball Tournament
March Madness Basketball Tournament Math Project COMMON Core Aligned Decimals, Fractions, Percents, Probability, Rates, Algebra, Word Problems, and more! To Use: -Print out all the worksheets. -Introduce
More informationReturns to Skill in Professional Golf: A Quantile Regression Approach
International Journal of Sport Finance, 2010, 5, 167-180, 2010 West Virginia University Returns to Skill in Professional Golf: A Quantile Regression Approach Leo H. Kahane 1 1 Providence College Leo H.
More information