Descriptive Statistics Descriptive Statistics vs Inferential Statistics Describing a sample Making inferences to a larger population Data = Information but too much information. How do we summarize data? Distributions Tables vs Graphs Measures related to distributions Different measures may paint a slightly different picture o Political use of statistics Different types of data Nominal Ordinal Cardinal or Interval Measures of Central Tendency Mode Median Mean Nominal, Ordinal, Cardinal Ordinal, Cardinal Cardinal Measures of Dispersion Variance Cardinal Standard Deviation Homogeneity vs Heterogeneity
Raw Data 1.00 1.00 6.33 2006.00 7.00 1.00.00.00.00 1.00 2500.00 5.00 10.00 12.00.00 1.00.00.00 1.00.00.00.00.00.00.00 1.00.00 1.00.00.00 1.00.00.00.00 1.00.00.00 6.00.00.00.00.00 1.00.00.00.00 1.00.00.00.00 1.00.00 1.00 2.00 3.00 5.00 4.00 6.00 1.00 6.00 2.00 5.00 4.00 3.00 2.00 4.00 3.00 5.00 1.00 1.00.00.00 1.00 8.00 3.00 1.00 5.00 1.00 1.00 2.00 3.00 4.00 3.00 1.00 1.00 2.00 1.00 5.67 2004.00 6.00.00 1.00.00.00.00 300.00 7.00 5.00.00.00 1.00.00.00 1.00.00.00.00 1.00.00.00.00.00 1.00.00.00 1.00.00.00.00.00 1.00.00 3.00 7.00.00.00.00.00 1.00.00.00.00.00 1.00.00 1.00.00 2.00 4.00 1.00 3.00 5.00 6.00 2.00 5.00 3.00 4.00 6.00 1.00 1.00 2.00 5.00 3.00 4.00.00 1.00.00.00 7.00 1.00 2.00 1.00 6.00 3.00 2.00 5.00 5.00 4.00 1.00 1.00 3.00 1.00 5.67 2005.00 9.00.00 1.00 1.00.00.00 2000.00 7.00 5.00 5.00.00 1.00.00 1.00.00.00.00.00.00.00 1.00.00.00.00.00.00.00 1.00.00.00.00 1.00.00 3.00 5.00.00.00.00.00 1.00.00.00.00.00 1.00.00 1.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00 1.00.00.00.00 8.00 3.00 1.00.00.00.00 2.00 5.00 5.00 4.00.00.00 4.00.00 5.17 2005.00 1.00 5.00 1.00 1.00.00.00 300.00 7.00 2.00 2.00.00 1.00.00 1.00.00.00.00.00 1.00.00.00.00 1.00.00.00 1.00.00.00.00.00.00 1.00.00 3.00 5.00.00.00.00.00 1.00.00 1.00.00.00.00.00 1.00.00 2.00 4.00 5.00 1.00 3.00 6.00 2.00 5.00 1.00 3.00 6.00 4.00 4.00 5.00 2.00 3.00 1.00.00 1.00.00.00 9.00 5.00 2.00 5.00 3.00 4.00 2.00 2.00 5.00 4.00 2.00 2.00 5.00 1.00 6.00 2006.00 5.00.00 1.00.00.00.00 14000.00 7.00 10.00 12.00.00 1.00.00 1.00.00.00.00.00 1.00.00.00.00 1.00.00.00.00 1.00.00.00.00.00 1.00.00 2.00 6.00.00 1.00.00.00.00.00.00.00.00 1.00.00 1.00.00 1.00 2.00 3.00 4.00 5.00 6.00 1.00 2.00 5.00 6.00 3.00 4.00 1.00 2.00 6.00 3.00 4.00.00 1.00.00 1.00 10.00 3.00 2.00 1.00 1.00 1.00 2.00 5.00 2.00 4.00 1.00 2.00 6.00 1.00 5.42 2007.00 12.00.00.00. 1.00.00 3.00 7.00 35.00 21.00.00 1.00.00.00 1.00.00.00.00.00.00.00 1.00.00.00 1.00.00 1.00.00.00 1.00.00.00.00 4.00 4.00.00.00 1.00.00.00.00 1.00.00.00.00.00.00.00 4.00 5.00 6.00 3.00 2.00 1.00 2.00 1.00 5.00 4.00 6.00 3.00 2.00 4.00 1.00 3.00 5.00.00 1.00.00 1.00 7.00 8.00 2.00 3.00 2.00 6.00.00 2.00 3.00 2.00 1.00.00 7.00 1.00 5.92 2004.00 3.00.00 1.00.00.00.00 330.00 7.00 14.00 6.00.00 1.00.00 1.00.00.00.00 1.00.00.00.00.00.00 1.00.00 1.00.00.00.00. 1.00.00.00 2.00 6.00.00.00 1.00.00.00 1.00.00.00.00.00.00 1.00.00 3.00 5.00 4.00 2.00 1.00 6.00 1.00 4.00 3.00 5.00 2.00 6.00 3.00 4.00 1.00 5.00 2.00.00 1.00.00.00 8.00 2.00 2.00 3.00 1.00 5.00 2.00 1.00 3.00. 2.00 1.00 8.00 1.00 5.75 2005.00 2.00.00 1.00.00.00.00.00 2.00 2.00 3.00.00 1.00.00.00.00 1.00.00.00 1.00.00.00.00 1.00.00.00.00 1.00.00 1.00.00.00.00.00 4.00 3.00.00.00.00 1.00.00.00.00.00.00 1.00.00 1.00.00 5.00 2.00 1.00 3.00 4.00 6.00 4.00 6.00 2.00 1.00 5.00 3.00 2.00 5.00 4.00 3.00 1.00.00 1.00.00.00 6.00 8.00 2.00 5.00 4.00 3.00 2.00 5.00 4.00 1.00 1.00 2.00 9.00 1.00 6.17 2004.00 1.00 5.00 1.00.00.00.00 300.00 7.00 24.00 12.00.00 1.00 1.00.00.00.00.00.00 1.00.00.00.00.00 1.00.00 1.00.00.00.00.00.00 1.00.00 4.00 6.00.00.00.00.00 1.00.00.00.00.00 1.00.00 1.00.00 1.00 3.00 5.00 4.00 2.00 6.00 2.00 5.00 1.00 3.00 4.00 6.00 2.00 5.00 1.00 4.00 3.00 1.00..00 1.00 8.00 2.00. 3.00 3.00 1.00 2.00 5.00 5.00 4.00 2.00 1.00 10.00 1.00 5.50 2005.00 8.00 3.00 1.00.00.00.00 20.00 7.00 14.00 14.00.00 1.00 1.00.00.00.00.00 1.00 1.00.00.00.00.00 1.00.00.00 1.00.00.00.00.00 1.00.00 2.00 6.00.00.00.00.00 1.00.00.00.00.00 1.00.00 1.00.00 1.00 6.00 5.00 3.00 2.00 4.00 6.00 3.00 1.00 4.00 5.00 2.00 5.00 4.00 1.00 3.00 2.00 1.00.00.00 1.00 6.00 2.00 1.00 3.00 3.00 1.00 2.00 5.00 5.00 4.00 1.00 1.00 11.00.00 5.83 2006.00 9.00.00 1.00.00.00.00 10.00 7.00 30.00 10.00.00 1.00 1.00.00.00.00.00.00.00 1.00.00.00.00 1.00.00 1.00.00.00.00.00.00 1.00.00 3.00 7.00.00.00.00 1.00.00.00 1.00.00.00.00.00 1.00.00 3.00 1.00 2.00 4.00 5.00 6.00 1.00 5.00 3.00 4.00 2.00 6.00 2.00 5.00 4.00 3.00 1.00.00 1.00.00 1.00 9.00 2.00 2.00 5.00 1.00 2.00 2.00 2.00 4.00 4.00 2.00 1.00 12.00.00 6.00 2004.00 10.00.00 1.00.00.00.00 5.00 5.00 15.00 9.00 1.00.00 1.00.00.00.00.00.00.00 1.00.00.00.00 1.00.00.00 1.00.00.00.00.00.00 1.00 2.00 5.00.00.00.00 1.00.00.00.00.00 1.00.00.00 1.00.00 1.00 5.00 4.00 2.00 3.00 6.00 1.00 6.00 4.00 5.00 3.00 2.00 4.00 2.00 5.00 1.00 3.00 1.00.00.00.00 7.00 4.00 1.00 4.00 1.00 1.00 2.00 4.00 4.00.00 1.00 1.00 13.00.00 5.67 2005.00 8.00.00.00 1.00.00.00 350.00 2.00 4.00 3.00.00 1.00 1.00.00.00.00.00.00.00 1.00.00.00 1.00.00.00.00.00 1.00.00.00.00 1.00.00 3.00 6.00.00.00.00.00 1.00.00.00.00 1.00.00.00 1.00.00 1.00 6.00 4.00 2.00 3.00 5.00 3.00 5.00 1.00 4.00 2.00 6.00 2.00 5.00 3.00 4.00 1.00 1.00.00.00.00 8.00 4.00 1.00 5.00 3.00 1.00 2.00 4.00 5.00 4.00.00 2.00 2
Distributions A way of describing how the data is distributed over its values Frequency of values in sample or in population Symmetric vs asymmetric or skewed Unimodal vs multimodal Over different types of data o Nominal o Ordinal o Cardinal or Interval Different ways of displaying o Frequency table o Bar graph, histogram, (kernel) density plot Cumulative distribution 3
Nominal Data Example 1: What is your favorite food? SPSS: Analyze -> Descriptive Statistics -> Descriptives Descriptive Statistics 1=italian, 2=greek, 3=chinese, 4=mexican, 5=american, 6=indian Valid N (listwise) N Min Max Mean Std. Dev. 68 0 6 2.6765 2.01073 68 What s wrong with this table? How do we treat Min = 0? 4
For nominal data, we re limited in the statistics we can use to summarize the data. Min Max Mode -- the category that occurs most often in the data; i.e., has the highest frequency May be 0, 1, or >1 modes SPSS: Analyze -> Descriptive Statistics -> Frequencies (set options in Statistics, Charts, Format) 1=italian, 2=greek, 3=chinese, 4=mexican, 5=american, 6=indian N Valid 68 Missing 0 Mode 1 Minimum 0 Maximum 6 5
Frequency Table one way of displaying the distribution of the sample data 1=italian, 2=greek, 3=chinese, 4=mexican, 5=american, 6=indian Category Freq Percent 0 4 5.9 1 30 44.1 2 1 1.5 3 9 13.2 4 5 7.4 5 11 16.2 6 8 11.8 Total 68 100.0 6
Bar Graph Information often easier to digest 40 1=italian, 2=greek, 3=chinese, 4=mexican, 5=am 30 20 Frequency 10 0.00 1.00 2.00 3.00 4.00 5.00 6.00 1=italian, 2=greek, 3=chinese, 4=mexican, 5=american, 6=indian Min = 0 Mode = 1 Max = 6 Dispersion? 7
After you ve eaten. Example 2: When you use a public toilet, do you flush with 1. hand, 2. foot, 3. other? 1=hand, 2=foot, 3=other Frequency Percent Valid Percent Valid 0 1 1.5 1.5 1 24 35.3 35.8 2 40 58.8 59.7 3 2 2.9 3.0 Total 67 98.5 100.0 Missing System 1 1.5 Total 68 100.0 8
50 1=hand, 2=foot, 3=other 40 30 20 Frequency 10 0.00 1.00 2.00 3.00 1=hand, 2=foot, 3=other Again, for nominal data, the bar graph provides a lot of information in a relatively compact way. Min =0 Mode = 2 Max = 3 Dispersion? 9
Ordinal Data Rankings Example: I of President Bush s handling of the war in Iraq. (a) Str App (b) Somewhat App (c) Indiff (d) Somewhat Disapp (e) Str Disapp Descriptive Statistics 1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag Valid N (listwise) N Min Max Mean Std. Dev. 68 1 5 3.4853 1.49119 68 On average, you re all pretty indifferent about Bush and the war in Iraq, right? 10
Median Arrange the numbers in order of their magnitude (i.e., from smallest to largest). For an odd set of numbers, the median is the middle value. Ex: 2, 3, 3, 5, 5, 6, 7, 7, 10, 201, 987 Median = 6 What is the mode? For an even set of numbers, the median is the average of the middle two numbers. Ex: 2, 3, 3, 5, 5, 7, 7, 10, 201, 987 Median = (5+7)/2 = 6 11
Example: I of President Bush s handling of the war in Iraq. (a) Str App (b) Somewhat App (c) Indiff (d) Somewhat Disapp (e) Str Disapp Statistics 1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag N Valid 68 Missing 0 Median 4 Mode 5 Minimum 1 Maximum 5 12
1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag Freq Percent Cumulative Percent Valid 1 8 11.8 11.8 2 17 25.0 36.8 3 2 2.9 39.7 4 16 23.5 63.2 5 25 36.8 100.0 Total 68 100.0 Odd, very few actually seem to be indifferent 13
30 1=str agree, 2=some agree, 3=indif, 4=some d 20 10 Frequency 0 1.00 2.00 3.00 4.00 5.00 1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag Bimodal distribution 14
Example 2: I of President Bush s handling of the economy. (a) Str App (b) Somewhat App (c) Indiff (d) Somewhat Disapp (e) Str Disapp Statistics 1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag N Valid 68 Missing 0 Median 4 Mode 5 Minimum 1 Maximum 5 15
30 1=str agree, 2=some agree, 3=indif, 4=some d 20 10 Frequency 0 1.00 2.00 3.00 4.00 5.00 1=str agree, 2=some agree, 3=indif, 4=some disag, 5=str disag 16
Cardinal/Interval Data Numbers on the real line Not just ranked categories, but Intensity (Arithmetic) Mean = sum of all values divided by the number of observations X n = 1 X = ( X + X +... + n i 1 2 i= 1 X n ) / n Ex: 2, 3, 3, 5, 5, 6, 7, 7, 10, 201, 987 Modes = 3, 5, 7 Median = 6 Mean = (2+3+3+5+5+6+7+7+10+201+987)/11 = 112.36 17
Variance How dispersed or spread out the data is Homogeneous vs heterogeneous s 2 = 1 n n i= 1 ( X i X ) 2 Standard Deviation = square root of variance = s Ex: 2, 3, 3, 5, 5, 6, 7, 7, 10, 201, 987 Variance = 87,599.45 Std Dev = 295.97 Ex: 2, 3, 3, 5, 5, 6, 7, 7, 10 Variance = 6.25 Std Dev = 2.5 18
Skewed Distributions mean = median symmetric distribution mean median asymmetric or skewed distribution mean median mode mode median mean to the right, positive mean median mode to the left, negative Skewness = (mean mode)/s = 3(mean median)/s Ex: Housing prices Salary (Bill Gates?) Politics 19
Ex: Height of students in this class Statistics HT_FT N Valid 68 Missing 0 Mean 5.69 Median 5.83 Mode 5.83 Std. Deviation.760 Variance.578 Skewness -6.387 Std. Error of Skewness.291 Minimum 0 Maximum 6.5 Mean, Median, and Mode all about the same, but heavily skewed? 20
Recoded 0 as missing value Statistics HT_FT N Valid 67 Missing 1 Mean 5.77 Median 5.83 Mode 5.83 Std. Deviation.299 Variance.089 Skewness -.118 Std. Error of Skewness.293 Minimum 5.00 Maximum 6.50 21
30 HT_FT 20 10 Frequency 0 5.00 5.25 5.50 5.75 6.00 6.25 6.50 Std. Dev =.30 Mean = 5.78 N = 67.00 HT_FT Bar Charts vs Histograms (and Kernel Densities) 7 bins 22
12 HT_FT 10 8 6 4 Frequency 2 0 6.58 6.50 6.42 6.33 6.25 6.17 6.08 6.00 5.92 5.83 5.75 5.67 5.58 5.50 5.42 5.33 5.25 5.17 5.08 5.00 4.92 Std. Dev =.30 Mean = 5.78 N = 67.00 HT_FT 21 bins 23
Ex: Number of Roomates Statistics ROOMATES N Valid 68 Missing 0 Mean 1.11 Median 0 Mode 0 Std. Deviation 1.652 Variance 2.732 Skewness 1.60 Std. Error of Skewness.29 Minimum 0 Maximum 6 24
40 ROOMATES 30 20 Frequency 10 0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 Std. Dev = 1.65 Mean = 1.1 N = 68.00 ROOMATES Skewed to the right 25
Some comments on the uses and abuses of descriptive statistics Each statistic is an equation or rule for calculating a number. Each may paint a different picture What is typical? Or normal? Mean? Median? Mode? What about when there s a large variance? What about bimodal distributions? Skewed distributions? When claims are made based on statistics, ask Is the statistic appropriate for the type of data? What other aspects of the distribution are being omitted? Important for political science research, but especially for examining claims when someone is trying to sell you something e.g., a politician or an advertiser. 26
Gary Jacobson. 1987. The Marginals Never Vanished. AJPS. Do incumbents have an advantage in elections? Do they win elections more easily? Do they win elections more often? Has electoral competition declined? Marginal how close two candidates are in an election. 27
Incumbents seem to be winning more votes on average in 1982 than in 1952. 28
However, what matters is winning or losing, not by how much. Recasts the question: Are incumbents winning more often? 29
Freshman Incumbents 30
Senior Incumbents 31
Jacobson s Conclusion No net change in overall security for incumbents First-term incumbents safer Senior incumbents not Political scientists have been puzzling over this, while actual politicians have been behaving as if they were just as vulnerable as ever. 32
Grofman, Koetzle, McGann. 2002. Congressional Leadership 1965-96 Are congressional leaders more or less extreme than their followers? Two theories: More extreme Centrist 33
34
Conclusion: Party leaders not necessarily centrist, but drawn from party mode. 35