Empirical Example II of Chapter 7

Similar documents
Persistence racial difference in socioeconomic outcomes. Are Emily and Greg More Employable than Lakisha and Jamal?

Announcements. % College graduate vs. % Hispanic in LA. % College educated vs. % Hispanic in LA. Problem Set 10 Due Wednesday.

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Measuring the I in Team

Quantitative Methods for Economics Tutorial 6. Katherine Eyal

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

PLANNED ORTHOGONAL CONTRASTS

Running head: DATA ANALYSIS AND INTERPRETATION 1

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Department of Economics Working Paper

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Efficiency Wages in Major League Baseball Starting. Pitchers Greg Madonia

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Modelling residential prices with cointegration techniques and automatic selection algorithms

Don t Worry, College Doesn t Make You Successful In the NBA

Department of Economics Working Paper

Setting up group models Part 1 NITP, 2011

Navigate to the golf data folder and make it your working directory. Load the data by typing

Week 7 One-way ANOVA

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Name May 3, 2007 Math Probability and Statistics

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

Effects of TV Contracts on NBA Salaries

Unit4: Inferencefornumericaldata 4. ANOVA. Sta Spring Duke University, Department of Statistical Science

Chapter 12 Practice Test

Does the NBA Encourage Early Entry? Griffin T. Tormey. Professor Marjorie B. McElroy, Faculty Advisor

Evaluation of Regression Approaches for Predicting Yellow Perch (Perca flavescens) Recreational Harvest in Ohio Waters of Lake Erie

Taking Your Class for a Walk, Randomly

One-way ANOVA: round, narrow, wide

This page intentionally left blank

MGB 203B Homework # LSD = 1 1

A few things to remember about ANOVA

Gamblers Favor Skewness, Not Risk: Further Evidence from United States Lottery Games

Section I: Multiple Choice Select the best answer for each problem.

1 Introduction. 2 EAD and Derived Factors

FISH 415 LIMNOLOGY UI Moscow

Case Processing Summary. Cases Valid Missing Total N Percent N Percent N Percent % 0 0.0% % % 0 0.0%

One-factor ANOVA by example

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

Legendre et al Appendices and Supplements, p. 1

Department of Economics Working Paper

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Biostatistics & SAS programming

IDENTIFYING SUBJECTIVE VALUE IN WOMEN S COLLEGE GOLF RECRUITING REGARDLESS OF SOCIO-ECONOMIC CLASS. Victoria Allred

Determinants of college hockey attendance

Building an NFL performance metric

1. Answer this student s question: Is a random sample of 5% of the students at my school large enough, or should I use 10%?

Experimental Design and Data Analysis Part 2

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities

Example 1: One Way ANOVA in MINITAB

Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement

Chapter 2 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Pierce 0. Measuring How NBA Players Were Paid in the Season Based on Previous Season Play

League Quality and Attendance:

Lesson 14: Modeling Relationships with a Line

See if you can determine what the following magnified photos are. Number your paper to 5.

End of Chapter Exercises

STAT 115 : INTRO TO EXPERIMENTAL DESIGN. Science answers questions with experiments

An Analysis of the Effects of Long-Term Contracts on Performance in Major League Baseball

Factorial Analysis of Variance

Is lung capacity affected by smoking, sport, height or gender. Table of contents

Lab 11: Introduction to Linear Regression

Transportation Research Forum

Predicting the Draft and Career Success of Tight Ends in the National Football League

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

Analysis of recent swim performances at the 2013 FINA World Championship: Counsilman Center, Dept. Kinesiology, Indiana University

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Chapter 3 Displaying and Describing Categorical Data

Housing Price and Rent Inflation after Hosting 2018 Winter Olympic Game in the city of Gangneung, Korea

4-3 Rate of Change and Slope. Warm Up Lesson Presentation. Lesson Quiz

How to Win in the NBA Playoffs: A Statistical Analysis

Driv e accu racy. Green s in regul ation

Select Boxplot -> Multiple Y's (simple) and select all variable names.

Algebra I: A Fresh Approach. By Christy Walters

FISH 415 LIMNOLOGY UI Moscow

MJA Rev 10/17/2011 1:53:00 PM

Pitching Performance and Age

Economic Value of Celebrity Endorsements:

ASTERISK OR EXCLAMATION POINT?: Power Hitting in Major League Baseball from 1950 Through the Steroid Era. Gary Evans Stat 201B Winter, 2010

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

Descriptive Statistics

ANOVA - Implementation.

FORECASTING BATTER PERFORMANCE USING STATCAST DATA IN MAJOR LEAGUE BASEBALL

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

End of Chapter Exercises

THE INTEGRATION OF THE SEA BREAM AND SEA BASS MARKET: EVIDENCE FROM GREECE AND SPAIN

When Falling Just Short is a Good Thing: the Effect of Past Performance on Improvement.

Does the NBA Encourage Early Entry? Griffin T. Tormey. Professor Marjorie B. McElroy, Faculty Advisor

One Way ANOVA (Analysis of Variance)

Robert Jones Bandage Report

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

Standardized CPUE of Indian Albacore caught by Taiwanese longliners from 1980 to 2014 with simultaneous nominal CPUE portion from observer data

International Discrimination in NBA

Compensating Wage Differentials and Wage Discrimination in Major League Baseball

Transcription:

Empirical Example II of Chapter 7 1. We use NBA data. The description of variables is --- --- --- storage display value variable name type format label variable label marr byte %9.2f =1 if married wage float %9.2f annual salary, millions $ exper byte %9.2f years as a professional player age byte %9.2f age in years coll byte %9.2f years playing at college games byte %9.2f average games per year minutes int %9.2f minutes per season points float %9.2f points per game rebounds float %9.2f rebounds per game assists float %10.2f assists per game draft int %9.2f draft number allstar byte %9.2f all-star player avgmin float %9.2f minutes per game black byte %9.2f =1 if black children byte %9.2f =1 if has children position str7 position of the player 2. We are interested in how the position of a player affects points and wage. But we need to be careful because position is a string and categorical (qualitative) variable. Unlike the dummy variable, position can take three values. In probability theory, we assume position follows multinomial distribution. 1

. tab position position Freq. Percent Cum. center 51 17.41 17.41 forward 116 39.59 57.00 guard 126 43.00 100.00 Total 293 100.00 Exercise: Can you guess which position has highest average wage? 3. The bar graph below compares the average points across positions graph bar points, over(position) mean of points 0 2 4 6 8 10 center forward guard Exercise: how much is the average point of center? 4. The statistical significance of the difference cannot be seen from the graph. To do so, we need to generate a set of dummy variables, one dummy for each position:. gen guard = (position=="guard"). gen center = (position=="center"). gen forward = (position=="forward") 2

You can use command tab to verify that the dummies are generated appropriately.. tab guard guard Freq. Percent Cum. 0 167 57.00 57.00 1 126 43.00 100.00 Total 293 100.00 5. The fact that the sum of guard, center and forward is one, a constant, indicates that we cannot use all three dummy variables along with the constant term. Otherwise we would run into dummy variable trap, a situation in which perfect multicollinearity arises. 6. Intuitively, because there are only three positions, we know a person must be center if he or she is not forward or guard. In other words, the center dummy is redundant once forward and guard dummies are included in the regression. 7. So, to avoid the dummy variable trap, we try using only two dummy variables along with the constant term. reg points forward guard Source SS df MS Number of obs = 287 -------------+------------------------------ F( 2, 284) = 3.36 Model 226.926754 2 113.463377 Prob > F = 0.0363 Residual 9602.39007 284 33.8112326 R-squared = 0.0231 -------------+------------------------------ Adj R-squared = 0.0162 Total 9829.31682 286 34.3682406 Root MSE = 5.8147 points Coef. Std. Err. t P> t [95% Conf. Interval] forward 1.940836.9782515 1.98 0.048.0152921 3.866379 guard 2.502496.9707714 2.58 0.010.5916758 4.413316 3

_cons 8.115686.8142268 9.97 0.000 6.513001 9.718371 8. ˆβ0 = 8.115686 is the average points for center (the base group, for which both forward and guard equal zero, or for which we drop the corresponding dummy variable). ˆβ1 = 1.940836 is the difference of average points between forward and center; ˆβ 2 = 2.502496 is the difference of average points between guard and center. In short, all comparison is made relative to the base group, and in this case, the base group is center. 9. Exercise: how much is ˆβ 0 if we use command reg points center guard? Which is the base group now? 10. We can test the null hypothesis that position does not matter for points, i.e., no difference between forward and center, and no difference between guard and center:. test forward guard ( 1) forward = 0 ( 2) guard = 0 F( 2, 284) = 3.36 Prob > F = 0.0363 The p-value is less than 0.05, so we find evidences that there is difference in points across positions (or position matters for points). Notice that this F test is reported by reg command automatically next to the ANOVA table. It is called F statistic for overall significance of a regression, see page 152 of the textbook for detail. 11. In fields like biology, people would say position is treatment, and another name for the F test is analysis of variance, or ANOVA for short. Simply put, we can carry out ANOVA by regressing a variable on a set of dummy variables and conduct the F test that all coefficients of dummy variables equal zero. 12. Alternatively, we can include all three dummy variables in regression, but then we have to drop the constant term with option noc. reg points center forward guard, noc 4

Source SS df MS Number of obs = 287 -------------+------------------------------ F( 3, 284) = 282.27 Model 28631.6899 3 9543.89665 Prob > F = 0.0000 Residual 9602.39007 284 33.8112326 R-squared = 0.7489 -------------+------------------------------ Adj R-squared = 0.7462 Total 38234.08 287 133.219791 Root MSE = 5.8147 points Coef. Std. Err. t P> t [95% Conf. Interval] center 8.115686.8142268 9.97 0.000 6.513001 9.718371 forward 10.05652.5422276 18.55 0.000 8.989227 11.12382 guard 10.61818.528613 20.09 0.000 9.577685 11.65868 The advantage of this regression without intercept is that we can get average points for each position directly. Nevertheless, the disadvantage is that we cannot test the difference directly, and R-squared now becomes misleading because without constant term one of the first order conditions of OLS becomes invalid i ûi 0. 13. For position, we can generate a non-string categorical variable called pid,. encode position, gen(pid). list position pid in 1/5, nolab +----------------+ position pid ---------------- 1. guard 3 2. guard 3 3. center 1 4. guard 3 5. forward 2 +----------------+ 5

and run regression based on pid. reg points i.pid, nohe points Coef. Std. Err. t P> t [95% Conf. Interval] pid 2 1.940836.9782515 1.98 0.048.0152921 3.866379 3 2.502496.9707714 2.58 0.010.5916758 4.413316 _cons 8.115686.8142268 9.97 0.000 6.513001 9.718371 Pay attention the regressor is specified as i.pid. The result is the same as the regression that uses forward and guard as regressors. 14. What is wrong with this command reg points pid? 6

15. Finally, let s see what factors matter for log wage. reg lwage marr exper minutes points rebounds assists allstar avgmin black guard. gen lwage = log(wage) (13 missing values generated) lwage Coef. Std. Err. t P> t [95% Conf. Interval] marr -.0018594.0830561-0.02 0.982 -.1654107.1616918 exper.0712154.0122108 5.83 0.000.0471703.0952604 minutes -.0000538.000121-0.44 0.657 -.000292.0001844 points.0513876.0165816 3.10 0.002.0187356.0840396 rebounds.0020807.026334 0.08 0.937 -.0497753.0539368 assists.0269989.0322371 0.84 0.403 -.0364813.0904791 allstar -.2507736.1620092-1.55 0.123 -.5697966.0682493 avgmin.030364.0159912 1.90 0.059 -.0011254.0618534 black.1288088.1006909 1.28 0.202 -.0694682.3270857 guard -.3711601.1513513-2.45 0.015 -.6691958 -.0731244 forward -.0901512.1149876-0.78 0.434 -.3165807.1362783 _cons -1.435384.1599524-8.97 0.000-1.750357-1.120412 Exercise: how to interpret each coefficient? 16. What are the possible reasons that minutes, rebounds and assists are insignificant? 7