1 Name: Homework Exercises Problem Set 1 (chapter 2) Exercise 2.5.1 The 10 third-grade students at Lake Wobegone elementary school took a spelling test with 10 words, each worth one point. Given that the average score was 9 exactly, what is the maximum number of children who could have scored above average? Exercise 2.5.2 1 Ten people in a room have an average height of 5 feet 6 inches. An 11th person, who is 6 feet 5 inches tall, enters the room. Find the average height of all 11 people. Exercise 2.5.3 Twenty-one people in a room have an average height of 5 feet 6 inches. An 22nd person, who is 6 feet 5 inches tall, enters the room. Find the average height of all 22 people. Compare to the previous exercise. Exercise 2.5.4 Twenty-one people in a room have an average height of 5 feet 6 inches. An 22nd person enters the room. How tall would he have to be to raise the average height by 1 inch? Exercise 2.5.5 (a) Find the average and the r.m.s. size of the numbers on the list 1, 3, 5, 6, 3. (b) Do the same for the list 11, 8, 9, 3, 15. 1 From Statistics, by Freedman, Pisani and Purves
2 Exercise 2.5.6 Guess whether the r.m.s. size of each of the following lists of numbers is around 1, 10, or 20. No arithmetic is necessary. (a) 1, 5, 7, 8, 10, 9, 6, 5, 12, 17 (b) 22, 18, 33, 7, 31, 12, 1, 24, 6, 16. (c) 1, 2, 0, 0, 1, 0, 0, 3, 0, 1 Exercise 2.5.7 (a) Find the r.m.s. size of the list 7, 7, 7, 7. (b) Repeat, for the list 7, 7, 7, 7. Exercise 2.5.8 Each of the numbers 103, 96, 101, 104 is almost 100, but is off by some amount. Find the r.m.s. size of the amounts off. Exercise 2.5.9 The list 103, 96, 101, 104 has an average. Find it. Each number in the list is off of the average by some amount. Find the r.m.s. size of the amounts off.
3 Exercise 2.5.10 Each of the following lists has an average of 50. For which one is the spread of the numbers around the average biggest? smallest? (i) 0, 20, 40, 50, 60, 80, 100 (ii) 0, 48, 49, 50, 51, 52, 100 (iii) 0, 1, 2, 50, 98, 99, 100 Exercise 2.5.11 Each of the following lists has an average of 50. For each one, guess whether the SD is closest to 1, 2 or 10. (This does not require any calculation). (a) 49, 51, 49, 51, 49, 51, 49, 51, 49, 51 (b) 48, 52, 48, 52, 48, 52, 48, 52, 48, 52 (c) 48, 51, 49, 52, 47, 52, 46, 51, 53, 51 (d) 54, 49, 46, 49, 51, 53, 50, 50, 49, 49 (e) 60, 36, 31, 50, 48, 50, 54, 56, 62, 53 Exercise 2.5.12 Which of the following lists has the larger range? Which has the larger SD? (A) 7, 9, 10, 10, 10, 11, 13 (B) 8, 8, 8, 10, 12, 12, 12 Exercise 2.5.13 (a) A company gives a flat raise of $90 per month to all employees. How does this change the average monthly salary of the employees? How does it change the SD?
4 (b) If the company instead gave employees a 3% raise, how would that change the average monthly salary? How would it change the SD? Exercise 2.5.14 What is the r.m.s. size of the list 17, 17, 17, 17, 17? What is the SD? Exercise 2.5.15 For the list 107, 98, 93, 101, 104, which is smaller the r.m.s. size or the SD? Exercise 2.5.16 Can the SD ever be negative? Exercise 2.5.17 For a list of positive numbers, can the SD ever be larger than the average? Exercise 2.5.18 (Exam scores) Suppose a class at a university has two lab sections. One section has 18 students, who score an average of 80 on their exam. The other section has 12 students who score an average of 75. What is the average for the whole class?
5 Exercise 2.5.19 A record of litter sizes (live-born pups) for a breeding colony of mice is shown below. Pups per litter: 1 2 3 4 5 6 7 8 9 10 11 Litters: 2 10 14 34 36 32 31 24 14 10 4 a) What is the total number of litters? b) What is the total number of pups? c) Calculate the mean litter size. Exercise 2.5.20 Note that we can represent the mean litter size as x = xwx wx where x {1, 2,..., 11} is the numbers of pups per litter, and w x is the number of a litters of size x (i.e. the frequency, or weight of x). Give a similar expression for the variance of the litter size (variance is the square of standard deviation). Exercise 2.5.21 (mean) The mean temperature in my office is 68 degrees Farenheit, based on five measurements on five different days. If I had recorded the temperatures in degrees Celcius, (a) what would their mean be? (b) Can you be sure without the original data? Why or why not?
6 (c) The standard deviation is 3 degrees F. Is the temperature control in my office adequate? Explain briefly. Exercise 2.5.22 (Pasadena Jan 1 Temperature) The following table gives the high temperature (degrees Farenheit) at the Burbank Airport on New Year s day for 10 years. Year Jan 1 High Temp (F) 2000 55 2001 75 2002 64 2003 68 2004 60 2005 54 2006 57 2007 71 2008 73 2009 72 a) Enter the data into a spreadsheet (e.g. excel) and add a column with Celcius temperature, C = (F 32)/1.8. b) Add an extra row with the mean of the Farenheit and Celcius temperatures. c) Add an extra row with the standard deviation of the Farenheit and Celcius temperatures. d) How many of the 10 observations fall within one standard deviation of the mean? How many fall within two standard deviations of the mean? e) Give a formula for converting the mean Farenheit temperature ( F ) to Celcius ( C).
7 f) Give a formula for converting the standard deviation (s F ) of the Farenheit temperature to the standard deviation (s C ) of the Celcius temperature. Exercise 2.5.23 A study of college students found that the men had an average weight of about 66 kg and an SD of about 9 kg. The women had an average weight of about 55 kg and an SD of about 9 kg. 1. Find the averages and SDs, in pounds (1 kg = 2.2 lb). 2. Just roughly, what percentage of men weighed between 57 kg and 75 kg? 3. If you took the men and women together, would the SD of their weights be smaller than 9 kg, just about 9 kg, or bigger than 9 kg? Why? Exercise 2.5.24 The 1000 Genomes project provides public access to human genome data, with the goal of finding most genetic variants that have frequencies of at least 1% in the populations studied. Variant Call Format (VCF) is one of the ways in which data on genetic variants is
8 presented. In VCF, one of the standard items of information on each variant is a quality score, defined as 10log 10 (p) where p is the probability that the variant call is wrong. (a) What is this score if the probability that the call is wrong is 0.01? (b) What if it the probability of being wrong is 0.2? (c) A score of 10 is sometimes required to pass a filter. What error probability does this correspond to? Exercise 2.5.25 Obtain the data file aconiazide.csv from the course website. This gives the change in weight (W) in grams, for rats that were fed one of five DOSES of a drug, aconiazide. Calculate the median, mean, and standard deviation for each dose. Write down what tool you used (e.g. Excel, Prism, R, R Comander, calculator,... ). Exercise 2.5.26 (Difference of squares) Here is a useful algebraic identity about the difference of two squares: x 2 y 2 = (x y)(x + y). a) Establish that it is true by multiplying out the right-hand side.
b (Optional) Can you draw a simple picture illustrating this identity as a statement about the areas of squares and rectangles? (Hint: Show a large square with sides of length x with a smaller square of area y 2 cut out of its corner, so that the area is x 2 y 2. Can you imagine cutting and moving a piece of the clipped square figure to get a rectangle?) 9 Exercise 2.5.27 (big numbers) Compute this difference in your head: (123456789) 2 (123456787)(123456791) Hint: Let x = 123456789, and let y = 2. We want x 2 (x 2)(x + 2). If you apply the algebraic identity from the previous exercise to the right-hand term, you shouldn t need to pick up a pencil. 1. What is the answer you get by the mental algebra above? 2. What do you get when you plug the expression (123456789) 2 (123456791)(123456787) into your favorite calculator (or excel)? 3. If these disagree, which answer do you believe, and why? 4. What do you get using R? Exercise 2.5.28 Do the R computing tutorial from chapter 1, and turn in the plot (with your name on it).