Acknowledgement: Author is indebted to Dr. Jennifer Kaplan, Dr. Parthanil Roy and Dr Ashoke Sinha for allowing him to use/edit many of their slides.

Similar documents
Chapter 2 - Displaying and Describing Categorical Data

Chapter 3 - Displaying and Describing Categorical Data

Chapter 3 Displaying and Describing Categorical Data

Chapter 2 Displaying and Describing Categorical Data

The Coach then sorts the 25 players into separate teams and positions

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

Chapter 2 - Frequency Distributions and Graphs

5.1. Data Displays Batter Up. My Notes ACTIVITY

STAT 155 Introductory Statistics. Lecture 2: Displaying Distributions with Graphs

Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a. proportion:

PSY201: Chapter 5: The Normal Curve and Standard Scores

Organizing Quantitative Data

CHAPTER 1 ORGANIZATION OF DATA SETS

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

Walk - Run Activity --An S and P Wave Travel Time Simulation ( S minus P Earthquake Location Method)

Constructing and Interpreting Two-Way Frequency Tables

Chapter 2: Visual Description of Data

Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2

CC Investigation 1: Graphing Proportions

box and whisker plot 3880C798CA037B A83B07E6C4 Box And Whisker Plot 1 / 6

Ch. 8 Review - Analyzing Data and Graphs

8th Grade. Data.

46 Chapter 8 Statistics: An Introduction

True class interval. frequency total. frequency

Overview. Learning Goals. Prior Knowledge. UWHS Climate Science. Grade Level Time Required Part I 30 minutes Part II 2+ hours Part III

Dotplots, Stemplots, and Time-Series Plots

Mark Scheme (Results) Summer 2009

Lab # 03: Visualization of Shock Waves by using Schlieren Technique

NCSS Statistical Software

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Lesson 20: Estimating a Population Proportion

Vectors in the City Learning Task

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Section 5 Critiquing Data Presentation - Teachers Notes

North Point - Advance Placement Statistics Summer Assignment

Age of Fans

A Hare-Lynx Simulation Model

STT 315 Section /19/2014

ANALYSIS FOR WIND CHARACTERISTICS IN TELUK KALUNG, KEMAMAN, TERENGGANU Muhammad Hisyam Abdullah 1, Mohamad Idris Bin Ali 1 and Ngien Su Kong 1

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Figure 1. Winning percentage when leading by indicated margin after each inning,

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

DS5 The Normal Distribution. Write down all you can remember about the mean, median, mode, and standard deviation.

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

March Madness Basketball Tournament

Energy Drilling Prospects

Lesson 20: Estimating a Population Proportion

Describing a journey made by an object is very boring if you just use words. As with much of science, graphs are more revealing.

Optimal Weather Routing Using Ensemble Weather Forecasts

Unit 6, Lesson 1: Organizing Data

BEFORE YOU OPEN ANY FILES:

Transportation Engineering II Dr. Rajat Rastogi Department of Civil Engineering Indian Institute of Technology - Roorkee

Lab 5: Descriptive Statistics

How to Make, Interpret and Use a Simple Plot

Performance/Pilot Math

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for

March Madness Basketball Tournament

Tension Cracks. Topics Covered. Tension crack boundaries Tension crack depth Query slice data Thrust line Sensitivity analysis.

Additional Reading General, Organic and Biological Chemistry, by Timberlake, chapter 8.

Policy Management: How data and information impacts the ability to make policy decisions:

Module 2. Topic: Mathematical Representations. Mathematical Elements. Pedagogical Elements. Activities Mathematics Teaching Institute

Math 121 Test Questions Spring 2010 Chapters 13 and 14

Lesson 3: Which fish should I eat?

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Chapter 2: Modeling Distributions of Data

Descriptive Statistics Project Is there a home field advantage in major league baseball?

CHAPTER 2 Modeling Distributions of Data

yarn (1-2 meters) tape sticky notes slinky short piece of yarn or ribbon calculator stopwatch

REAL LIFE GRAPHS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Cumulative Frequency Diagrams

2013 Grade 6 Mathematics Set B

IHS AP Statistics Chapter 2 Modeling Distributions of Data MP1

SAMPLE RH = P 1. where. P 1 = the partial pressure of the water vapor at the dew point temperature of the mixture of dry air and water vapor

FUNCTIONAL SKILLS MATHEMATICS (level 1)

Practice Test Unit 06B 11A: Probability, Permutations and Combinations. Practice Test Unit 11B: Data Analysis

Get in Shape 2. Analyzing Numerical Data Displays

CHAPTER 8 (SECTIONS 8.1 AND 8.2) WAVE PROPERTIES, SOUND

The pth percentile of a distribution is the value with p percent of the observations less than it.

PHYSICS 12 NAME: Kinematics and Projectiles Review

NAME: A graph contains five major parts: a. Title b. The independent variable c. The dependent variable d. The scales for each variable e.

Calculation of Trail Usage from Counter Data

1. Rewrite the following three numbers in order from smallest to largest. Give a brief explanation of how you decided the correct order.

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND

Chapter 11 Waves. Waves transport energy without transporting matter. The intensity is the average power per unit area. It is measured in W/m 2.

The Bruins I.C.E. School

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

9.3 Histograms and Box Plots

Lesson 2.1 Frequency Tables and Graphs Notes Stats Page 1 of 5

Box-and-Whisker Plots

States of Matter. The Behavior of Gases

3. EXCEL FORMULAS & TABLES

National Renewable Energy Laboratory. Wind Resource Data Summary Guam Naval Ordnance Annex Data Summary and Retrieval for November 2009

Ozobot Bit Classroom Application: Boyle s Law Simulation

STAT 625: 2000 Olympic Diving Exploration

Combination Analysis Tutorial

NBA TEAM SYNERGY RESEARCH REPORT 1

Equine Cannon Angle System

Math 146 Statistics for the Health Sciences Additional Exercises on Chapter 2

Practice Test Unit 6B/11A/11B: Probability and Logic

SHOT ON GOAL. Name: Football scoring a goal and trigonometry Ian Edwards Luther College Teachers Teaching with Technology

Transcription:

Acknowledgement: Author is indebted to Dr. Jennifer Kaplan, Dr. Parthanil Roy and Dr Ashoke Sinha for allowing him to use/edit many of their slides.

Topic for this lecture 0Today s lecture s materials can be read from Chapters 3 of the textbook. 0I am going to cover only a part of the textbook in this class and the part I do not cover is not important for this course. 0Today we shall cover some descriptive statistics of categorical variables. 0In descriptive statistics we summarize data through graphs and tables. 2

3 Rules of Data Analysis 1. Make a picture 0 To help you thinkclearly about the patterns and relationships hiding in your data. 2. Make a picture 0 To showthe important features and unexpected values or patterns in your data. 3. Make a picture 0 Totellothers what your data reveal. 3

How to display Categorical Data? 0 Frequency Tables 0Bar Charts 0Pie Charts 0 Contingency Tables 4

Frequency Tables 0The frequencyof a particular data value is the number of times the data value occurs. Thus frequency is simply a count of a particular level. 0In frequency table categories/levels are written in the left most column and the corresponding frequencies are written in the second column. 0Sometimes proportions or percentages are also written instead of or in addition to the actual counts. Proportion is also called relative frequency. 5

Frequency Table: An Example Frequency Table of the number of Golf Balls sold in different days of a week Day # of Golf Balls Sold % of Golf Balls Sold (Frequency) Monday 17 19.54 Tuesday 13 14.94 Wednesday 15 17.24 Thursday 20 22.99 Friday 22 25.29 Total 87 100 6

Bar Charts 0A bar chart or bar graph is a chart with rectangular bars with lengths proportional to their frequencies. 0The bars can be plotted vertically (more common) or horizontally (less common). 0The percentages or relative proportions can also be plotted instead of the actual values. 7

Bar Chart : Golf Ball Sold # of Golf Balls Sold 25 20 15 17 13 15 20 22 10 5 0 Monday Tuesday Wednesday Thursday Friday 8

The following bar chart represents the incarceration rate (per 100000 people) of various countries. 9

Pie Chart 0A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating proportion. 0The arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents. 0The math is carried out based on the following: 100% is same as 360 degrees. 10

Pie Chart: Golf Ball Sold % of Golf Balls Sold 25% 20% 15% Monday Tuesday Wednesday Thursday Friday 23% 17% 11

Pie Chart: An Example Pie Chart of English Native Speakers 12

Bar Chart vs. Pie Chart 0Bar chart is used more often to represent the actual frequencies while pie chart is used to represent relative proportions (in %). 0When comparison of relative proportion is important, pie chart is more appropriate. 0When the absolute counts or frequencies are more important, a bar chart should be used. 13

Major points so far 0First step in organizing data 0draw a picture 0Appropriate pictures for categorical data 0Pie chart 0Bar chart 14

Multiple categorical variables How to represent two categorical variables in tabular form? Contingency tables, cluster bar plots and stacked bar plots. 15

Contingency Tables 0A contingency table(also referred to as cross tabulation or cross tab) is often used to record and analyze the relation between two (or more) categorical variables. 0Here rows represent the categories of one categorical variable, and the columns represent the categories of other categorical variable. 0The cells corresponding to row and column entries tabulate the respective frequencies. 0Most often, we have two categorical variables and we can answer many questions on the data from the contingency table. 16

Data from STT 200 Class 0How many sophomores were there in Section 3? 16 0How many students were there in Section 1? 31 0How many Seniors were in the class? 7 Freshmen Sophomores Juniors Seniors Total Sec 1 23 2 3 3 31 Sec 2 16 10 2 2 30 Sec 3 2 16 9 2 29 Sec 4 9 14 7 0 30 Total 50 42 21 7 120 17

What proportion of students are A. About 74% B. About 46% C. Exactly 30% D. About 42% freshman? Freshmen Sophomores Juniors Seniors Total Sec 1 23 2 3 3 31 Sec 2 16 10 2 2 30 Sec 3 2 16 9 2 29 Sec 4 9 14 7 0 30 Total 50 42 21 7 120 E. The answer is not given Solution: (50/120)*100 % = 41.67%. i.e. about 42% Answer: D 18

What proportion of students in section 4are freshman? Freshmen Sophomores Juniors Seniors Total Sec 1 23 2 3 3 31 Sec 2 16 10 2 2 30 Sec 3 2 16 9 2 29 Sec 4 9 14 7 0 30 Total 50 42 21 7 120 A. About 74% B. About 46% C. Exactly 30% D. About 42% Solution: (9/30)*100% = 30%. Answer: C E. The answer is not given 19

What proportion of freshman are in section 1? Freshmen Sophomores Juniors Seniors Total Sec 1 23 2 3 3 31 Sec 2 16 10 2 2 30 Sec 3 2 16 9 2 29 Sec 4 9 14 7 0 30 Total 50 42 21 7 120 A. About 74% B. Exactly 46% C. Exactly 30% D. About 42% E. The answer is not given Solution: (23/50)*100% = 46%. Answer: B 20

Cluster and stacked bar plots 0Plots can be drawn to compare between different groups, or to check if there is any relation between two categorical variables. 0Two plots are widely used for this purpose: Cluster bar plot Stacked bar plot 21

Cluster bar plot (seniority on horizontal axis): frequencies 25 20 15 10 Sec 1 Sec 2 Sec 3 Sec 4 5 0 Freshmen Sophomores Juniors Seniors 22

Cluster bar plot (sections on horizontal axis): frequencies 25 20 15 10 Freshmen Sophomores Juniors Seniors 5 0 Sec 1 Sec 2 Sec 3 Sec 4 23

Stacked bar plot (for seniority): frequencies 60 50 40 30 20 Sec 4 Sec 3 Sec 2 Sec 1 10 0 Freshmen Sophomores Juniors Seniors 24

Stacked bar plot (for sections) : frequencies 35 30 25 20 15 Seniors Juniors Sophomores Freshmen 10 5 0 Sec 1 Sec 2 Sec 3 Sec 4 25

Are the variables related or independent of each other? 0To see if the categorical variables Seniority and Section are related, it will be more suitable to make stack plots of conditional relative frequencies. 0Here is a table of conditional relative frequencies (in percentages) given the seniority: Freshmen Sophomores Juniors Seniors Sec 1 46 4.76 14.29 42.86 Sec 2 32 23.81 9.52 28.57 Sec 3 4 38.1 42.86 28.57 Sec 4 18 33.33 33.33 0 Total 100 100 100 100 26

Stacked bar plots for comparison and finding relation 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Freshmen Sophomores Juniors Seniors Sec 4 Sec 3 Sec 2 Sec 1 0Here the segments of first bar represents percentages of different sections given the students are freshmen. Similarly the other bars represent Sophomores, juniors and Seniors respectively. 0As the segments of same color are of different length on different bar, we conclude that sections and seniority are related (i.e. not independent). 27

Simpson s paradox Do not use unfair averages. Lurking variable. 28

Example: Simpson s paradox 0Two pilots: Moe and Jill. We are interested in the fact how often they landed their flights on time. Day Night Overall Moe 90 out of 100 10 out of 20 100 out of 120 (83%) Jill 19 out of 20 75 out of 100 94 out of 120 (74%) 0Moe has a success rate of 83% (100 out of 120). 0Jill has a success rate of 78% (94 out of 120). 0Does it mean Moe has a better success rate than Jill? 29

Example: Simpson s paradox Day Night Overall Moe 90 out of 100 (90%) 10 out of 20 (50%) 100 out of 120 (83%) Jill 19 out of 20 (95%) 75 out of 100 (75%) 94 out of 120 (78%) 0Note that during dayjill has success rate 95% (19 out of 20), which is better than Moe s 90% (90 out of 100). 0Also during night Jill has a better success rate of 75% (75 out of 100), in comparison to Moe s 50% (10 out of 20). 0So Jill is better than Moe both at day and night, but worse overall. How is it possible? 0Notice landing at night is more difficult, and Jill flies mostly at night. In the overall average that fact is not considered, and hence the anomaly. 0 So be careful when interpreting the overall average! 30