Biostatistics and Research Design in Dentistry Categorical Data Reading assignment Chapter 3 Summarizing data in Dawson-Trapp starting with Summarizing nominal and ordinal data with numbers on p 40 thru Tables and graphs for nominal and ordinal data on p 47. Recall that Chapter 3 asks, What are the different kinds of data and how do we use this information to organize and display the data? Incidence rate is the proportion of new Summarizing categorical (nominal) data cases that have occurred during a given Proportion: the part to whole fraction interval of time divided by the population Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a Usually this is estimated by a cohort study or proportion: by some disease monitoring/reporting system. The proportion of (describe the That is, sample the population at risk who do denominator) who (describe the numerator) NOT now have the disease. Follow them for a is a / (a+b) = x.xxx. fixed period and determine how many new Use a sufficient number of decimal places cases appear. Often this is done prospectively. when reporting a proportion and also report the numerator and denominator. Percent: the part to whole fraction times 100. Note that all percents are between 0 and 100. How to construct a sentence describing a percent: The percent of (describe the denominator) who (describe the numerator) is a / (a+b) = xx.x%. Rates: the part to whole fraction times some other multiplier. Ratio: the part to part fraction. Note that ratios are all greater than zero. How to construct a sentence describing a ratio: Among (describe both groups), the ratio of (describe the numerator) to (describe the denominator) is a / b = x.xxx. Prevalence rate is the proportion of individuals with a disease at a given point in time divided by the population at risk. Usually this is estimated by a cross-sectional study; That is, sample the population at risk. Of those sampled, how many have the disease? Often this is done retrospectively. 3.5.3 Adjusting rates These methods are not often used in dental research. The intent is to correct (adjust) rates so that they are comparable. Unadjusted rates may be not comparable because of factors affecting the denominators of the rates. For instance if one wanted to compare death rates in two populations, and one of the populations was older and the other younger, the unadjusted death rates would not be comparable. Tables and Graphs (p. 46) Frequency table: Previously described Contingency table: The structure of a contingency table is as follows: The rows are labeled with the values of one classification variable. The columns are labeled with the values of a different classification variable. The cells in the table are usually the count (frequency) of the number of individuals with both characteristics. Bar chart: The same as a histogram. Descriptive Statistics for Categorical Variables 1
Contingency Tables Rhea Davis surveyed medical practioners who may council parents on when to schedule a child s first dental visit. Her data may be displayed in a contingency table, or cross-tabulation table, or a two-way classification. Table 1 Contingency Table Dentist ian Dentist total 1 15 6 68 89 2 33 20 22 75 3 63 84 1 148 4 17 11 1 29 total 128 121 92 341 She surveyed n = 128 general dentists, of whom x = 15 recommended a dental visit within the first year of a child s life. Contingency Table A contingency table shows the classification of subjects according to two criteria. The rows describe one criteria and the columns the other. The entries in the table are the number of observations that correspond to instances of both criteria. Constructing a tabular display When constructing a table to compare proportions between groups, keep these points in mind: The outcomes (variable values) form the rows. The header for the first column identifies the variable describing the outcomes, and each row s first-column value identifies the specific event. 1 2 3 4 total The samples (groups) form the columns. The spanner for the columns identifies the variable describing the groups, and each column s first-row value identifies the specific outcomes. Dentist ian Dentist total Descriptive Statistics for Categorical Variables 2
The title of the table should identify the rows and columns. In a contingency table, the entries in the table are frequencies (counts). Sometimes it is also necessary to tell your reader this, either in the table title or in a footnote. If other things are in the table for example, proportions be sure that the denominator is clear. Often it s useful to include marginal totals. There are NO vertical lines in tables 1. Yuck! Dentist ian Dentist total 1 15 6 68 89 2 33 20 22 75 3 63 84 1 148 4 17 11 1 29 total 128 121 92 341 Histogram Another way to display this information is in a histogram (see Figure 1). This is a good illustration of the observation that...drawing graphs, like motor-car driving and love-making, is one of those activities which almost every researcher thinks he or she can do well without instruction. 2 A great deal has been written on what makes good or bad graphs 3 and the Figure makes nearly all of the possible mistakes. For more of the best and worst, see: http://www.math.yorku.ca/scs/gallery/. 1 Vertical rules generally are not used in medical publications. P 62 in American Medical Association Manual of Style, 9 th edition, 1998. 2 Wainer & Thissen, 1991Annual Review of Psychology. 3 For instance, WS Cleveland (1994) The Elements of Graphing Data. Hobart Press, Summit NJ. Or, AAM Nicol & PM Pexman (2003) Displaying Your Findings. APA Press, Washington DC. Descriptive Statistics for Categorical Variables 3
Figure 1 3-D Histogram 90 80 70 60 Count 50 40 30 20 10 Dentist 0 1 2 Recomended 3 4 ian Dentist Comparing the heights of these bars does not make sensible comparisons. Tabular Display Another tabular display would show the characteristic of interest: proportion recommending each year. Table 2 shows proportions calculated separately for each column. Table 2 Estimated Proportion recommending X within each group Dentist ian Dentist overall 1 0.117 0.050 0.739 0.261 2 0.258 0.165 0.239 0.220 3 0.492 0.694 0.011 0.434 4 0.133 0.091 0.011 0.085 total 1.000 1.000 1.000 1.000 Note that the columns sum to 1. That is, each proportion was calculated separately for each population. These proportions answer the questions: Descriptive Statistics for Categorical Variables 4
It s sensible to ask whether 0.117 is equal to 0.050 is equal to 0.739; if so, they are all equal to the overall proportion of 0.261. Other Tables Other proportions less sensible or useful perhaps could be calculated. Table 3 shows the result when the proportions in each row sum to 1. Table 3 Proportion of each within each Recommendation Dentist ian Dentist total 1 0.169 0.067 0.764 1.000 2 0.440 0.267 0.293 1.000 3 0.426 0.568 0.007 1.000 4 0.586 0.379 0.034 1.000 overall 0.375 0.355 0.270 1.000 It s sensible to ask if 0.169 is equal to 0.440 is equal to 0.426 is equal to 0.586; if so, it s equal to the overall proportion of 0.375. Whole Table Proportions Here is the last of three ways that proportions may be calculated. We could calculate proportions based on the total N in the whole study. That is, every cell-n in Table 4 is divided by 341. Table 4 Proportion of the Whole Dentist ian Dentist total 1 0.044 0.018 0.199 0.261 2 0.097 0.059 0.065 0.220 3 0.185 0.246 0.003 0.434 4 0.050 0.032 0.003 0.085 total 0.375 0.355 0.270 1.000 The first proportion in the 1 row answers the question Of everyone in the whole study, what proportion recommended 1 AND were general dentists?. None of the proportions in the above table may be sensibly compared. Graphical Display Returning to the proportions shown in Table 2, which figure would you choose to show the comparisons of interest? Descriptive Statistics for Categorical Variables 5
0.8 0.7 0.6 Proportion 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 4 Age Dentist Dentist ian 0.8 0.7 0.6 Proportion 0.5 0.4 0.3 0.2 0.1 0.0 Dentist ian Dentist Practioner, Age 1 2 3 4 100% 90% 80% 70% Percent 60% 50% 40% 30% 20% 10% 0% Dentist ian Dentist Practioner, Age 1 2 3 4 Descriptive Statistics for Categorical Variables 6