Organizing Quantitative Data MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2018
Objectives At the end of this lesson we will be able to: organize discrete data in tables, construct histograms of discrete data, organize continuous data in tables, construct histograms of continuous data, draw stem-and-leaf plots, draw dot plots, identify the shape of a distribution, draw time-series graphs.
Organizing Discrete Data If there are relatively few values of the variable, we may treat them the same as qualitative data. If there are many values of the variable, we create categories called classes using intervals of numbers.
Few Values of the Variable Example Construct a frequency and relative frequency distribution for the final exam scores of students in an earlier semester of MATH 130. 68 75 76 75 72 71 73 75 74 77 71 76 75 75 76 72 69 72 72 73 68 67 77 73 Remark: the grades range from 67 to 77 (only 11 different possible grades) so we treat the data as if it were qualitative.
Solution Grade Frequency Relative Frequency 67 1 0.0417 68 2 0.0833 69 1 0.0417 70 0 0.0000 71 2 0.0833 72 4 0.1667 73 3 0.1250 74 1 0.0417 75 5 0.2083 76 3 0.1250 77 2 0.0833 Total 24 1.0000
Histogram Definition A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.
Example Construct a frequency histogram of the final exam grades presented earlier (repeated below for convenience). 68 75 76 75 72 71 73 75 74 77 71 76 75 75 76 72 69 72 72 73 68 67 77 73
Histogram of Frequencies Freq. 5 4 3 2 1 Grades
Histogram of Relative Frequencies Rel. Freq. 0.20 0.15 0.10 0.05 67 68 69 70 71 72 73 74 75 76 77 Grades
Continuous Data Continuous data must be organized into intervals of numbers called classes. The lower class limit of a class is the smallest value within the class. The upper class limit of a class is the largest value within the class. The class width is the difference between two consecutive lower class limits. A table is open ended if the first class has no lower class limit or the last class has no upper class limit. There is no best choice of class width. We usually pick a class width which produces 5 12 classes. class width maximum minimum number of classes
Example Consider the following data representing the length in minutes of final round tennis matches. 50.4 78.2 72.8 56.3 73.1 67.2 89.1 41.7 87.1 77.3 40.1 56.6 66.0 74.1 67.9 53.8 89.3 84.6 68.4 53.7 78.0 80.9 78.9 78.1 Construct a frequency and relative frequency table for the data with five categories.
Solution (1 of 2) It will be helpful if we start by sorting the data in ascending order. 40.1 41.7 50.4 53.7 53.8 56.3 56.6 66.0 67.2 67.9 68.4 72.8 73.1 74.1 77.3 78.0 78.1 78.2 78.9 80.9 84.6 87.1 89.1 89.3
Solution (2 of 2) The minimum and maximum times are respectively 40.1 and 89.3 respectively. Thus if we choose the first lower class limit to be 40 and the class width to be 10, we can summarize the data as follows. Class Frequency Relative Frequency 40 49.9 2 0.0833 50 59.9 5 0.2083 60 69.9 4 0.1667 70 79.9 8 0.3333 80 89.9 5 0.2083
Histogram of Continuous Data To create a histogram of the data we label the lower class limits on the horizontal axis and the class frequency (or relative frequency) on the vertical axis. Example Construct a histogram of the tennis match data using the frequencies just determined in the previous table.
Frequency Histogram Freq. 8 6 4 2
Relative Frequency Histogram Rel. Freq. 0.35 0.30 0.25 0.20 0.15 0.10 0.05 50 60 70 80 90 Time
Stem-and-Leaf Plots 1. The stem of the graph will consist of the digits to the left of the right-most digit. The leaf of the graph will be the rightmost digit. 2. Write the stems in a vertical column in increasing order. Draw a vertical line to the right of the stems. 3. Write each leaf corresponding to the stems to the right of the vertical line. 4. Write the leaves in ascending order.
Example Round the tennis match times to the nearest whole minute and draw a stem-and-leaf plot. First round the data given earlier. 40 42 50 54 54 56 57 66 67 68 68 73 73 74 77 78 78 78 79 81 85 87 89 89 The stems are the tens digits of the data and the leaves are the ones digits of the data.
Solution 4 02 5 04467 6 6788 7 33478889 8 15799
Splitting Stems We may use more than one stem for a class of data. 2 8 3 459 4 666779 5 011233455566779999 6 001234444555678888899999 7 000122333333344444566667899 8 0011333777889 9 11123444 2 8 3 459 4 666779 5 0112334 5 55566779999 6 001234444 6 555678888899999 7 000122333333344444 7 566667899 8 0011333777889 9 11123444
Dot Plots A dot plot is drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed. Example Draw a dot plot of the final exam grades presented earlier and repeated below for convenience. 68 75 76 75 72 71 73 75 74 77 71 76 75 75 76 72 69 72 72 73 68 67 77 73
Solution 67
Shapes of Distributions We may describe variables through the shape of its histogram. uniform, frequency of each value of the variable is evenly spread across the values of the variable. bell-shaped, highest frequency occurs in the middle and frequencies tail off to the left and right. skewed right, tail to the right of the peak is longer than the tail to the left of the peak. skewed left, tail to the left of the peak is longer than the tail to the right of the peak.
Uniform Distribution 35 30 25 20 15 10 5
Bell-Shaped Distribution 80 60 40 20
Distribution Skewed Right 150 100 50
Distribution Skewed Left 80 60 40 20
Time-Series Graphs If the value of a variable is measured at different points in time, the data are referred to as time-series data. Definition A time-series plot is obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points.
Example Draw a time-series plot of housing permits issued according to the following table. Housing Permits Year (in thousands) 2000 1592.3 2001 1636.7 2002 1747.7 2003 1889.2 2004 2070.1 2005 2155.3 2006 1838.9 2007 1398.4 2008 905.4 2009 583.0 2010 592.9
Time-Series Plot Permits 2500 2000 1500 1000 500 2000 2002 2004 2006 2008 2010 Year