First digit of chosen number Frequency (f i ) Total 100

Similar documents
Reduced drift, high accuracy stable carbon isotope ratio measurements using a reference gas with the Picarro 13 CO 2 G2101-i gas analyzer

The impact of foreign players on international football performance

CS 2750 Machine Learning. Lecture 4. Density estimation. CS 2750 Machine Learning. Announcements

Equilibrium or Simple Rule at Wimbledon? An Empirical Study

High Speed 128-bit BCD Adder Architecture Using CLA

Risk analysis of natural gas pipeline

Methodology for ACT WorkKeys as a Predictor of Worker Productivity

A Study on Parametric Wave Estimation Based on Measured Ship Motions

Decomposition guide Technical report on decomposition

Development of Accident Modification Factors for Rural Frontage Road Segments in Texas

English Premier League (EPL) Soccer Matches Prediction using An Adaptive Neuro-Fuzzy Inference System (ANFIS) for

A PROBABILITY BASED APPROACH FOR THE ALLOCATION OF PLAYER DRAFT SELECTIONS IN AUSTRALIAN RULES

PERFORMANCE AND COMPENSATION ON THE EUROPEAN PGA TOUR: A STATISTICAL ANALYSIS

COMPENSATING FOR WAVE NONRESPONSE IN THE 1979 ISDP RESEARCH PANEL

Johnnie Johnson, Owen Jones and Leilei Tang. Exploring decision-makers use of price information in a speculative market

Terminating Head

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

ADDITIONAL INSTRUCTIONS FOR ISU SYNCHRONIZED SKATING TECHNICAL CONTROLLERS AND TECHNICAL SPECIALISTS

Hedonic Price Analysis of Thoroughbred Broodmares in Foal

Applications on openpdc platform at Washington State University

Evaluation of a Center Pivot Variable Rate Irrigation System

SECOND-ORDER CREST STATISTICS OF REALISTIC SEA STATES

Modeling the Performance of a Baseball Player's Offensive Production

Comprehensive evaluation research of volleyball players athletic ability based on Fuzzy mathematical model

Randomization and serial dependence in professional tennis matches: Do strategic considerations, player rankings and match characteristics matter?

Crash Frequency and Severity Modeling Using Clustered Data from Washington State

Driver s Decision Model at an Onset of Amber Period at Signalised Intersections

Referee Bias and Stoppage Time in Major League Soccer: A Partially Adaptive Approach

Keywords: Ordered regression model; Risk perception; Collision risk; Port navigation safety; Automatic Radar Plotting Aid; Harbor pilot.

Automated External Defibrillators DESIGNED FOR UNEXPECTED HEROES

Evaluating the Effectiveness of Price and Yield Risk Management Products in Reducing. Revenue Risk for Southeastern Crop Producers * Todd D.

Evaluating Rent Dissipation in the Spanish Football Industry *

1.1 Noise maps: initial situations. Rating environmental noise on the basis of noise maps. Written by Henk M.E. Miedema TNO Hieronymus C.

An intro to PCA: Edge Orientation Estimation. Lecture #09 February 15 th, 2013

CAREER DURATION IN THE NHL: PUSHING AND PULLING ON EUROPEANS?

it500 Internet Thermostat

DOWNHOLE ENTHALPY MEASUREMENT IN GEOTHERMAL WELLS

Major League Duopolists: When Baseball Clubs Play in Two-Team Cities. Phillip Miller. Department of Economics. Minnesota State University, Mankato

Comparisons of Means for Estimating Sea States from an Advancing Large Container Ship

Product Information. Universal gripper PZN-plus

Pedestrian Facilities Planning on Tianjin New Area program

ALASKA DEPARTMENT OF FISH AND GAME DIVISION OF COMMERCIAL FISHERIES NEWS RELEASE

For models: 660 EF/EFO

DETECTION AND REFACTORING OF BAD SMELL

OWNERSHIP STRUCTURE IN U.S. CORPORATIONS. Mohammad Rahnamaei. A Thesis. in the. John Molson School of Business

VOLUME TRENDS NOVEMBER 1988 TRAVEL ON ALL ROADS AND STREETS IS FOR NOVEMBER 1988 AS COMPARED UP BY 3.4 PERCENT TO NOVEMBER 1987.

JIMAR ANNUAL REPORT FOR FY 2001 (Project ) Project Title: Analyzing the Technical and Economic Structure of Hawaii s Pelagic Fishery

Numerical Simulation on a Gas Distributor Used In Slurry Column Reactor Yating CAI

Report No. FHWA/LA.13/508. University of Louisiana at Lafayette. Department of Civil and Environmental Engineering

Summary and Cruise Statistics 8

Analysis of Hold Time Models for Total Flooding Clean Extinguishing Agents

Beating a Live Horse: Effort s Marginal Cost Revealed in a Tournament

UMPQUA RIVER FALL CHINOOK SALMON ESCAPEMENT INDICATOR STOCK PROJECT

Impact of Intelligence on Target-Hardening Decisions

The Initial Phases of a Consistent Pricing System that Reflects the Online Sale Value of a Horse

Canadian Journal of Fisheries and Aquatic Sciences. Seasonal and Spatial Patterns of Growth of Rainbow Trout in the Colorado River in Grand Canyon, AZ

RCBC Newsletter. August Richmond County Baseball Club. Inside this issue: 2016 College Showcase Camp. Tournament Update.

A comparison study on the deck house shape of high speed planing crafts for air resistance reduction

Planning of production and utility systems under unit performance degradation and alternative resource-constrained cleaning policies

Bubble production by breaking waves

PREDICTIONS OF CIRCULATING CURRENT FIELD AROUND A SUBMERGED BREAKWATER INDUCED BY BREAKING WAVES AND SURFACE ROLLERS. Yoshimitsu Tajima 1

Relative Salary Efficiency of PGA Tour Golfers: A Dynamic Review

11. Contract or Grant No. Lubbock, Texas

SUPPLEMENTAL MEMORANDUM. TRANCHE l

Aalborg Universitet. Published in: 9th ewtec Publication date: Document Version Publisher's PDF, also known as Version of record

Degassing of deep groundwater in fractured rock

EXPLAINING INTERNATIONAL SOCCER RANKINGS. Peter Macmillan and Ian Smith

ASSEMBLY INSTRUCTIONS. TECHART roof spoiler

M.H.Ahn, K.J.Lee Korea Advance Institute of Science and Technology 335 Gwahak-ro, Yuseong-gu, Daejeon , Republic of Korea

Peace Economics, Peace Science and Public Policy

Ergonomics Design on Bottom Curve Shape of Shoe-Last Based on Experimental Contacting Pressure Data

Cost theory and the cost of substitution a clarification

Evolutionary Sets of Safe Ship Trajectories: Evaluation of Individuals

Product Information. Gripper for small components MPG-plus

M. Álvarez-Mozos a, F. Ferreira b, J.M. Alonso-Meijide c & A.A. Pinto d a Department of Statistics and Operations Research, Faculty of

SOME OBSERVATIONS ON THE CO-ORDINATION DIAPHRAGMATIC AND RIB MOVEMENT IN RESPIRATION

Free Ride, Take it Easy: An Empirical Analysis of Adverse Incentives Caused by Revenue Sharing

Peak Field Approximation of Shock Wave Overpressure Based on Sparse Data

Response based sea state estimation for onboard DSS Safe and Efficient Marine Operations

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

ST70+ User Reference Manual. Document reference: Date: January 2009

Price Determinants of Show Quality Quarter Horses. Mykel R. Taylor. Kevin C. Dhuyvetter. Terry L. Kastens. Megan Douthit. and. Thomas L.

Wave Breaking Energy in Coastal Region

Muscle drain versus brain gain in association football: technology transfer through

Gripping force, O.D. gripping. MPZ 20 / 4 bar MPZ 20 / 6 bar. MPZ 20-AS / 4 bar MPZ 20-AS / 6 bar. Gripping force, I.D. gripping

Quantitative gas saturation estimation by frequencydependent

Bit shape geometric considerations when sampling by dry drilling for obtaining chloride profiles in concrete

PREDICTION OF POLYDISPERSE STEAM BUBBLE CONDENSATION IN SUB-COOLED WATER USING THE INHOMOGENEOUS MUSIG MODEL

CFD Simulation of R134a and R410A Two-Phase Flow in the Vertical Header of Microchannel Heat Exchanger

Recreational trip timing and duration prediction: A research note

Heart rates during competitive orienteering

Numerical Study of Occupants Evacuation from a Room for Requirements in Codes

Availability assessment of a raw gas re-injection plant for the production of oil and gas. Carlo Michelassi, Giacomo Monaci

Propane Safety. What to Do in Case of Emergency How to Use Propane Safely Propane Scratch-and-Sniff Test Appliance Safety and Maintenance Tips

Study on coastal bridge under the action of extreme wave

The fish community of Rat Cove, Otsego Lake, sumn,er 1997

Chinese and foreign men s decathlon performance comparison and structural factor correlation test based on SPSS regression model

RCBC Newsletter. September Richmond County Baseball Club. Inside this issue: Johnny Ray Memorial Classic. RCBC on You Tube

Investigation on Hull Hydrodynamics with Different Draughts for 470 Class Yacht

No TRINIDAD AND TOBAGO.

Transcription:

1 4. ANALYSING FREQUENCY TABLES Categorcal (nomnal) data are usually summarzed n requency tables. Contnuous numercal data may also be grouped nto ntervals and the requency o observatons n each nterval may also be summarzed n a requency table (or n a hstogram; see earler lab on Explorng and Descrbng Data ). In ths lab we wll explore two knds o requency tables and the deas they may be used to test. One-way Frequency Table The rst type o requency table lsts the number o observatons n derent categores o a sngle lst. An example s the ollowng table evaluatng how good humans are at choosng random numbers. The data are rom the early years o a US State Lottery, n whch players would buy a tcket and choose any number they wanted between 000 and 999. Wnnngs would be dvded between all holders o the wnnng number, whch was chosen randomly. The ollowng data are based on a random sample o 100 players o the Lottery (these are not the wnnng numbers, but rather they are the numbers selected by players). Lsted are the requences o numbers chosen that have 0 to 9 as the rst dgt: Frst dgt o chosen number Frequency ( ) 0 4 1 16 2 14 3 15 4 13 5 8 6 9 7 7 8 8 9 6 Total 100 These requences may be compared to those predcted by derent hypotheses. For example, when players pck a number between 000 and 999, are some rst dgts more popular than others? It looks lke numbers begnnng wth 0 are unpopular, and those begnnng wth 1 through 4 are excessvely popular. A goodness o t test s an approprate method or testng these data aganst the null hypothess that there s no preerence or derent dgts n the populaton. Two-way (Contngency) Table The second knd o requency table s the two-way table, or contngency table. Here, every observaton s cross-classed by two category varables nstead o just one. The usual goal s to test whether true (populaton) relatve numbers o ndvduals allng nto the derent classes or one varable s the same regardless o ndvdual values or the second varable. An example gven below lsts the number o survvors and non-survvors n two classes o mountaneers descendng

2 rom the peak o Mount Everest between 1978 and 1999: those usng supplemental oxygen, and those descendng wthout supplemental oxygen. (Most deaths on Mount Everest occur durng the descent, not the ascent.) Survval Used supplemental oxygen Dd not use supplemental oxygen Survved descent 1045 88 Dd not survve descent 32 8 Total 1077 96 (data rom Huey and Egusktza 2000, JAMA 284: 181) In ths case we are nterested n knowng whether the relatve numbers o survvors and nonsurvvors depends on whether or not supplemental oxygen was used. Ths s not an expermental study, so we are unable to test whether a derence n survval between classes s caused by oxygen use, but at least we can decde whether supplemental oxygen and survval are assocated. The null hypothess s once agan the skeptcal pont o vew: survval and oxygen use are not assocated wth one another (.e., survval and oxygen use are ndependent). A test o derng survval requences between the two categores o mountaneers s carred out usng a contngency test. Hypothess Testng Formng and testng hypotheses s one o the most basc endeavors n statstcal analyss o bologcal data. Wth your notes and the course textbook, revew your knowledge o the ollowng concepts: null hypothess (H o ) and alternate hypothess (H a ). Type I errors and Type II errors sgncance level degrees o reedom Test Statstcs or Goodness o Ft and Contngency Tests The ch-squared statstc, χ 2, s a measure o dscrepancy between observed and expected requences, where expected requences are those expected under the null hypothess. A second measure o dscrepancy s the G-statstc (the log lkelhood rato): 2 χ = 2 G = 2 ln

3 Under the null hypothess both statstcs have a dstrbuton that conorms approxmately to the theoretcal ch-squared dstrbuton. The degrees o reedom wll usually be k 1, where k s the number o classes o the category varable, except n specal stuatons to be dealt wth later n the course. Analogous statstcs contrast observed and expected requences n contngency tables. Here, however, the expected requences are based on the null hypothess that relatve requences are the same n each set. The expected requency or row and column j n the contngency s obtaned as RC j j = N Where R and C j are the row and column totals, respectvely, and N s the grand total number o ndependent observatons. Wth your notes and the course textbook, revew your knowledge o the ollowng concepts: ndependence rules o thumb or low expected requences n ch-square tests Yates correcton or contnuty [JMP IN does not employ ths correcton] Fsher s exact test Usng the program In the case o one-way tables, only a sngle categorcal varable s requred (e.g., Frst dgt o chosen number ). Two categorcal varables are needed or a two-way (contngency) table (e.g., Use o supplemental oxygen and Survval ). Make sure that ater enterng the data, the category varable(s) have the nomnal attrbute (ths can be reset n the columns secton o the let rame, or by selectng Column Ino n the Cols pull-down menu). The observed requences may be entered drectly to a new column (call t observed requency or number o observatons. To produce a bar graph o requences rom a one-way table, use the Dstrbuton menu opton and select the categorcal varable as your Y column n the pop-up wndow. In the same wndow you also need to select the observed requency column as your Freq varable. To carry out a goodness o t test, clck the red symbol next to the categorcal varable name above the bar graph and select Test Probabltes. Ths acton wll open a new dsplay box below the requency table n the Dstrbuton output wndow. Here you wll need to enter the expected requences or your test. Clck on each row and enter ether the expected requency or the expected proporton or that row (t doesn t matter whch, as long as you are consstent; the goodness o t test wll be carred out usng the expected requences n ether case). Unortunately, JMP IN doesn t dsplay the expected requences t uses to calculate the test statstc, so these wll be lackng you have smply entered the expected proportons. In ths case you wll be unable to ensure that the expected requences are large enough to ulll the assumptons o the χ 2 goodness o t test. To calculate expected requences you wll need to use your own calculator, or better yet the JMP IN calculator.

4 To produce a mosac plot or a two-way (contngency) table, use the Ft Y by X menu opton. In the pop-up wndow, select one o the categorcal varables as your Y column and the other as you X column. Once agan, select the observed requency column as your Freq varable. A two-way table wll also appear beneath the mosac plot, gvng the observed requences (the program wll also dsplay the expected requences but you need to select ths opton by pressng the red symbol next to the Contngency Table ttle). Unortunately, JMP IN does not nclude the Yates correcton or contnuty when the G-test and ch-square (Pearson) tests are carred out on 2x2 tables (you wll need to nclude the Yates correcton wth 2x2 tables on your assgnments and wrtten exams). However, t does nclude the Fsher exact test, whch you can use to valdate the results o the ch-square and G tests. One-way and two-way requency tables can be constructed rom raw data on ndvdual subjects usng the Tables -> Summary opton n the pull-down menu or by selectng Summary n the Tables tab on the JMP Starter. In the pop-up menu choose one (n one-way tables) or two (or two-way tables) categorcal varables and clck the Group button. Then clck the Statstcs button n the same wndow and select N. When you clck OK a new data table wll appear that talles the requency o observatons correspondng to each category or combnaton o categores. Problems 1. Enter the Lottery data gven above and generate the correspondng bar graph. a) Examne the bar graph. Do the requences appear to vary greatly between classes? b) Carry out a statstcal test o the hypothess that players avor some rst dgts over others when choosng a number between 000 and 999. In your work, present all steps (.e., state hypotheses, gve the P-value, the sgncance level or the test, and state your concluson). Snce the computer provdes the P-value drectly, there s no need to provde the crtcal value rom the tables n Zar. c) Compare the results or the ch-squared (Pearson) to those or the G test (Lkelhood rato). Why are they derent? d) Compare the results rom your vsual apprasal o the data to the goodness o t tests. Whch approach provdes qualtatve normaton and whch one provdes quanttatve normaton? What level o uncertanty s assocated wth those quanttatve probabltes? e) What are the degrees o reedom or these tests? Why do we lose a degree o reedom? ) Why would t be necessary to alter the analyss expected values are small?

5 2. A physcal gene map o the human genome was publshed n 1998 that contaned the estmated locatons o 30075 human genes. The table below lsts the estmated number o genes on each chromosome. The second column lsts the racton o the total human genome made up by each chromosome. For example, the X chromosome consttutes a lttle more than 5% o the total genome sze. These data are n the data le genemap98.jmp on the shared drectory. Chromosome Proporton o total genome Observed number o genes 1 0.0834 3114 2 0.0809 2257 3 0.0679 2015 4 0.0644 1478 5 0.0615 1529 6 0.0580 1893 7 0.0542 1594 8 0.0492 1206 9 0.0460 1248 10 0.0457 1371 11 0.0457 1755 12 0.0453 1585 13 0.0311 703 14 0.0295 1047 15 0.0282 1029 16 0.0311 849 17 0.0292 1263 18 0.0269 523 19 0.0212 1114 20 0.0228 758 21 0.0123 305 22 0.0136 565 X 0.0520 874 Data rom Deloukas et al (1998). A physcal map o 30,000 human genes. Scence 282:744 746 (see http://www.ncb.nlm.nh.gov/genemap98/page.cg?f=genedstrb.html). a) Dsplay the estmated numbers o genes on derent chromosomes usng a bar graph and mosac plot. Descrbe the derences between chromosomes. Whch chromosomes have the most genes? Whch have the ewest? b) We would not expect each chromosome to have the same number o genes because chromosomes der n sze. Use the varable Proporton o total genome to calculate the expected number o genes on each chromosome, takng nto account chromosome sze derences. (Create a new column to receve the expected requences, and use the JMP IN

6 calculator to compute them. The total number o genes s 30075. Call the new varable Expected No. genes ). Generate a bar graph or the expected requences and place t besde the bar graph or the observed numbers o genes. Do larger chromosomes tend to have more genes? c) Generate a new column and compute the ollowng quantty or each chromosome usng the JMP IN calculator: (Observed No. genes) (Expected No. genes) (Expected No. genes) Ths quantty (sometmes called a z-score) measured the derence between observed and expected requences scaled by the square root ( ) o the expected requences. On ths scale, whch chromosomes have a dramatc decency o genes or ther sze? Whch chromosomes have the most dramatc excesses? d) Use the observed and expected requences to test the null hypothess that gene number s determned purely by chromosome sze. To have the program do ths or you automatcally you wll need to enter the expected requences or each chromosome, one at a tme. Alternatvely, you could use the JMP IN calculator to compute the ch-square statstc drectly. Ths s easly done by squarng the quanttes calculated n (c) and summng them up (Col Sum s an opton n the Statstcal unctons provded n the Functons panel o the JMP IN calculator wndow). You can also use the calculator to provde you wth the P- value or the calculated χ 2 statstc usng Probablty -> ChSquareDstrbuton, or you can look up the approprate crtcal value n Zar. Note: The substantally lower than expected gene densty on the X chromosome mght result rom expresson bas. Gene expresson rom the X chromosome s reduced because n emales the second copy o the X chromosome s nactvated, and n males a second X s lackng (males are XY). Gene wth reduced expresson are more dcult to detect by the method the researchers used to nd them. 3. Enter the Mount Everest mountaneer survval and supplemental oxygen data rom the above table nto a JMP IN data table. The most useul way to do ths s to create a new data table wth 4 rows and three columns. Call the rst column Survval and the second column Oxygen use. Enter the our combnatons o these two varables nto the our rows. Fnally, put the observed requences nto the thrd column. a) Inspect the mosac plot or these data. Descrbe the pattern n words. Are the relatve requences o ndvduals survvng smlar or derent n the two oxygen groups? b) Test whether survval o mountaneers descendng rom Mount Everest s sgncantly assocated wth use o supplemental oxygen. Show all steps n your work (a good habt, as always). c) Why does the P-value or the Fsher s exact test der rom that o the Pearson χ 2 and the G tests?

7 d) Repeat the calculaton o the Pearson χ 2 by hand. Dd you obtan the same number as JMP IN? Why? e) Do the expected requences satsy the assumptons o the ch-square test? What strategy do you recommend? ) The authors who compled the Everest data also presented results rom the teams o mountaneers descendng Everest (clmbers tend not to go alone). These data are gven below. Whch data set s the most approprate to test an assocaton between survval and supplemental oxygen? Why? Survval Used supplemental oxygen Dd not use supplemental oxygen All team members survved 85 24 At least one team member ded 8 4 Total 93 28 g) Should we conclude rom the test n () that supplemental oxygen has no eect on survval? h) The same authors also compled smlar data or K2, a nearby summt n the Hmalayas. Analyse these data n the same way as or Mount Everest. Are the results the same as those n ()? Survval Used supplemental oxygen Dd not use supplemental oxygen All team members survved 12 24 At least one team member ded 0 12 Total 12 36 4. Open the data le student_data.jmp rom the shared drectory. Ths le records the data taken rom Bology 300 students on the rst day o class, January 2001. The varables are: heght, Student heght n cm hand, Student handedness (let or rght; both was classed as let) parent.rst, Parent lsted rst by student when gvng ther heghts (mom or dad) mom.heght, Student s mother s heght, n cm dad.heght, Student s ather s heght, n cm mom.hand, Whether mother s let or rght-handed sex, Whether student s male or emale a) Use Dstrbuton to test whether male and emale students occur wth equal requency n the Bo 300 class. Note that n the pop-up wndow you wll not need to specy a column or the Freq button because you are workng now wth the raw data nstead o the requency table.

8 b) Use an approprate method to test whether there s a statstcal assocaton between handedness o student (let or rght-handed) and that o hs/her mother. c) Use Tables to generate a two-way (contngency) table or handedness o student and mother. Ths method shows how JMP IN may be used to construct requency tables rom raw data. d) Some students lsted ther dad rst when gvng ther heght, whereas some student lsted ther mother rst. Does ths depend on the sex o the student?