Modelling World Record Data for the Men s 100-Metre Dash. Can the Data collected Predict the Ultimate in Human Achievement for this Event?

Similar documents
USING A CALCULATOR TO INVESTIGATE WHETHER A LINEAR, QUADRATIC OR EXPONENTIAL FUNCTION BEST FITS A SET OF BIVARIATE NUMERICAL DATA

Shedding Light on Motion Episode 4: Graphing Motion

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?

Calculation of Trail Usage from Counter Data

27Quantify Predictability U10L9. April 13, 2015

How to Make, Interpret and Use a Simple Plot

Percentage. Year. The Myth of the Closer. By David W. Smith Presented July 29, 2016 SABR46, Miami, Florida

SHOT ON GOAL. Name: Football scoring a goal and trigonometry Ian Edwards Luther College Teachers Teaching with Technology

A Study of Olympic Winning Times

A Hare-Lynx Simulation Model

Lesson 14: Modeling Relationships with a Line

Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037)

A Nomogram Of Performances In Endurance Running Based On Logarithmic Model Of Péronnet-Thibault

Chapter 12 Practice Test

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?

8th Grade. Data.

REAL LIFE GRAPHS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Mark Scheme (Results) Summer 2009

Applying Hooke s Law to Multiple Bungee Cords. Introduction

Opleiding Informatica

PSY201: Chapter 5: The Normal Curve and Standard Scores

Atmospheric Rossby Waves in Fall 2011: Analysis of Zonal Wind Speed and 500hPa Heights in the Northern and Southern Hemispheres

Exponential Decay In 1989, the oil tanker Exxon Valdez ran aground in waters near

Grade: 8. Author(s): Hope Phillips

Section I: Multiple Choice Select the best answer for each problem.

Lesson 16: More on Modeling Relationships with a Line

Running head: DATA ANALYSIS AND INTERPRETATION 1

Bioequivalence: Saving money with generic drugs

Parametric Ball Toss TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson. TI-Nspire Navigator System

QUESTIONS and ANSWERS WHY TEST KONGSBERGS

MATHEMATICS: PAPER II

4-3 Rate of Change and Slope. Warm Up. 1. Find the x- and y-intercepts of 2x 5y = 20. Describe the correlation shown by the scatter plot. 2.

1. OVERVIEW OF METHOD

STANDARD SCORES AND THE NORMAL DISTRIBUTION

Journal of Quantitative Analysis in Sports

LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/12

2008 Excellence in Mathematics Contest Team Project B. School Name: Group Members:

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

Equation 1: F spring = kx. Where F is the force of the spring, k is the spring constant and x is the displacement of the spring. Equation 2: F = mg

Movement and Position

Organizing Quantitative Data

It s conventional sabermetric wisdom that players

Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4,

Maneuver Descriptions

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Advanced Hydraulics Prof. Dr. Suresh A. Kartha Department of Civil Engineering Indian Institute of Technology, Guwahati

Teaching Notes. Contextualised task 35 The 100 Metre Race

Objective Determine how the speed of a runner depends on the distance of the race, and predict what the record would be for 2750 m.

PREDICTION VERSUS REALITY: THE USE OF MATHEMATICAL MODELS TO PREDICT ELITE PERFORMANCE IN SWIMMING AND ATHLETICS AT THE OLYMPIC GAMES

Advanced Hydraulics Prof. Dr. Suresh A. Kartha Department of Civil Engineering Indian Institute of Technology, Guwahati

Teacher's Manual. First Printing: September Master Books P.O. Box 726 Green Forest, AR Printed in the United States of America

Emerging Crash Trend Analysis. Mark Logan Department of Main Roads, Queensland. Patrick McShane Queensland Transport

The Aging Curve(s) Jerry Meyer, Central Maryland YMCA Masters (CMYM)

Draft - 4/17/2004. A Batting Average: Does It Represent Ability or Luck?

TRIP GENERATION RATES FOR SOUTH AFRICAN GOLF CLUBS AND ESTATES

LESSON 5: THE BOUNCING BALL

Denise L Seman City of Youngstown

Homework Helpers Sampler

University of Colorado-Boulder MATH 1300 Homework 1

Mini-Golf Course Description. 1. You must draw your design on a piece of graph paper so that it will cover all four quadrants.

! Problem Solving Students will use past Olympic statistics and mathematics to predict the most recent Olympic statistics.

Ch. 2 & 3 Velocity & Acceleration

Product Technical Bulletin #48

Saphir Guided Session #8

MODELING RADIOACTIVE DECAY WITH FLUID DYNAMICS

APPENDIX A COMPUTATIONALLY GENERATED RANDOM DIGITS 748 APPENDIX C CHI-SQUARE RIGHT-HAND TAIL PROBABILITIES 754

Ozobot Bit Classroom Application: Boyle s Law Simulation

3. Answer the following questions with your group. How high do you think he was at the top of the stairs? How did you estimate that elevation?

Bhagwant N. Persaud* Richard A. Retting Craig Lyon* Anne T. McCartt. May *Consultant to the Insurance Institute for Highway Safety

On the association of inrun velocity and jumping width in ski. jumping

Sprint Hurdles SPRINT HURDLE RACES: A KINEMATIC ANALYSIS

LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary

Compare the scalar of speed and the vector of velocity.

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Traffic safety developments in Poland

Ocean Waves and Graphs

23 RD INTERNATIONAL SYMPOSIUM ON BALLISTICS TARRAGONA, SPAIN APRIL 2007

Review of A Detailed Investigation of Crash Risk Reduction Resulting from Red Light Cameras in Small Urban Areas by M. Burkey and K.

Deer Population Student Guide

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Pool Plunge: Linear Relationship between Depth and Pressure

MATH IN ACTION TABLE OF CONTENTS. Lesson 1.1 On Your Mark, Get Set, Go! Page: 10 Usain Bolt: The fastest man on the planet

SAMPLE MAT Proceedings of the 10th International Conference on Stability of Ships

CHAPTER 4: DECIMALS. Image from Microsoft Office Clip Art CHAPTER 4 CONTENTS

APPROACH RUN VELOCITIES OF FEMALE POLE VAULTERS

Building System Dynamics Models

CHAPTER 1 ORGANIZATION OF DATA SETS

Navigate to the golf data folder and make it your working directory. Load the data by typing

Combined impacts of configurational and compositional properties of street network on vehicular flow

WHAT CAN WE LEARN FROM COMPETITION ANALYSIS AT THE 1999 PAN PACIFIC SWIMMING CHAMPIONSHIPS?

PGA Tour Scores as a Gaussian Random Variable

Combination Analysis Tutorial

Introductory Lab: Vacuum Methods

ROUNDABOUT CAPACITY: THE UK EMPIRICAL METHODOLOGY

Exponent's Fundamentally Flawed Research

Objectives. Materials

O-Calc Pro Sag Tension Calculations Explained

3. GRADUALLY-VARIED FLOW (GVF) AUTUMN 2018

An Analysis of Graphical Representations of Bowling. Allison Nelson. Summer Ventures Appalachian State University

March Madness Basketball Tournament

Transcription:

Modelling World Record Data for the Men s 100-Metre Dash Can the Data collected Predict the Ultimate in Human Achievement for this Event? 1

Introduction While not really interested in sport or athletics events, I have always been drawn to news about records being broken. Usually, when such news is reported, the previous (just broken) record is given, and how long it endured. Sometimes, the new record is an incremental improvement in the previous record, other times the new record is an astounding jump. Sometimes, the previous record has, itself just been set, other times, it is a record that has prevailed for many years or even decades. The motivation behind this exploration has been to establish if there are detectable patterns in the way records are broken, and if predictions about what the limiting value (if any) in the event can be identified. The original direction of the exploration was to select a range of varied athletics events and collect world record data related to them. By plotting them on a scatter graph it was hoped that trends could be observed and investigated by hand-fitting curves from functions which exhibit the properties of the scatter graph. Naturally, it was also envisaged that built-in regression tools from software could additionally be used to obtain parameters of the modelling functions. The first event studied was the Men s 100-metre Dash and the data provided many areas of deliberation and thought-provoking interpretation which resulted in it being the sole data set used to discuss the question Can the data collected predict the ultimate in human achievement? Data Collection and Related Decisions The obvious and quickest route to retrieving the data was from Wikipedia. The data available was thorough and more than sufficient for this exploration. Not being a sportsman, there were some decisions I needed to make about the data I was confronted with to ensure that consistency and meaningful interpretation were maintained. I started by looking at the 100-metre track event for men. Immediately I ran into issues with the data that I had to decide carefully how to handle. These included (not in any particular order): Wind speed and altitude of track; type of timing process used and associated accuracy; ratified versus unratified timings, and duplicate times when records are repeated. In general I chose the simplest route possible and therefore ignored wind speed and altitude effects, I used the timings as given (either to the nearest tenth or the nearest hundredth of a second) and I chose to include only ratified data, purely because the unratified data may have unintentional and undesirable inaccuracies. Where the data is duplicated if the record has been repeated at a later date, I have chosen to ignore and thus exclude this data. My justification is because I am only interested in when the current record is first broken. Note that the data, collected in this way, is not able to provide evidence of a general improvement or development of the performance of athletes for a particular event. To elaborate on this last point, it may well be that an event has not had a record broken for over twenty years (because of a single outstanding performance all that time ago), however, if the average values produced by all contestant over a year were taken into account then there might be a significant improvement over the last twenty years. For all data collected, the date 01.01.1900 seemed to be a convenient point in time to set t = 0. All dates of records being broken have therefore been calculated as fractions of a year after this date. Therefore, for example, a record set on 1 st October 1925 would be given the time value 25.25. The discussion about the need for such precision, whilst maybe questionable, is superfluous because these calculations can be created in an instant in a spreadsheet program! 2

Plotting the Data and Identifying Graphing Trends The data collected for the men s 100-metre dash is included in Figure 1, and is accompanied by the scatter graph presenting this data when it is plotted. Figure 1: World record data for the men s 100-metre dash (table and scatter graph). There are several features that are evident and worthy of comment from immediately from this basic scatter graph. The data seems to be a fairly convincing straight line! However, the data points are plotted on a coordinate system that includes the horizontal axis. If zooming is performed in the y- direction then the relatively slight improvements marking new records are exaggerated in a way that a straight line of best fit might not be the immediate tool to jump to. Certainly, a line of best fit will in no way provide the possibility of predicting future performance, because it certainly eventually leads to ridiculously impossible times that would require superhuman qualities in the real sense of the expression! And, of course, the ultimate ridiculous conclusion if such a line were to be followed is that the 100-metre race can eventually be completed in zero time! There is a sudden and substantial dip of the last data plot. This was recorded by Usain Bolt on 16 th August, 2009 exactly one year after the previous record was set. Earlier on in the record accumulation process, the interval between records being broken is considerably larger than later on. The first ten of the twenty-one records listed occurred between 1891 and 1983 almost one hundred years while the remaining eleven occurred over a duration of twenty-one years! This feature will certainly have implications on the nature of the modelling function and suggests different ways it could or should be handled. 3

Figure 2: World record data for the men s 100-metre dash (zoomed vertical axis). After zooming, there appears to be two different paths that the data follows. The last nine data points create a new trend that spoils what would have been a very good candidate for an exponential decay model! This led me to a new approach when considering the exploration theme I set myself. Imagine it is 14 th June, 1991 and Leroy Burrell has just entered the record books with a new world record of 9.9 seconds. If this was the last data item to be plotted, how would the scatter graph appear then? Would a model that is created from the plot be able to predict what has happened in the following two decades? The identical scatter graph with those nine points removed is shown in Figure 3. Figure 3: World record data for the men s 100-metre dash last nine data points removed. 4

The points plotted can be seen to show a trend of exponential decay. Therefore, a function f(x) given by f( x) ( A C) e Bx C (1) was set up. The parameters A, B and C can be adjusted to ensure that the ensuing exponential decay curve best passes through the plotted points. The advantage of presenting the function with the parameters as shown in Equation 1 is that, designed this way, the value of C determines the value of the suspected limiting value (estimated by eye), and the value of A determines the value at the start of 1900 (x = 0), again estimated by eye. In the above figure the value of A is set to 10.7 and the value of C Figure 4: Exponential decay function y = f(x) as given by Equation 1, with A = 10.7, B = 1 and C = 9.5. Figure XX shows the basic curve with A estimated to be 10.7 and C estimated to be 9.5. The value of B is 1 and this is decreased to make the decay from the initial 1900 value to the limiting value much more gentle and pass through the data points as well as possible. After taking B = 0.186 and then raising C to 9.72, the following reasonably successful and convincing fit was achieved (see fig. XX) 5

Figure 5: Exponential decay function y = f(x) after adjustment to fit to values, with A = 10.7, B = 0.0186 and C = 9.72. I also wanted to see how the exponential regression function on my graphic calculator would handle this fit. This proved a more interesting endeavour, because the regression model that is developed is of the form x y ab This form ensures that, if 0 < b < 1, the graph will decay and the asymptote will be the horizontal axis. Of course, this location for the horizontal asymptote is not ideal for the above model! In order to fix this, I incorporated a vertical shift of all the data points downwards by the suspected limiting value. I then used the built-in regression tool to find a value of a and b that best fit the data points, and I recorded the value of the correlation coefficient r, which indicates the closeness of fit. I then adjusted the vertical shift little by little and detected improvement or deterioration of fit by considering the value of r. Using a crude but effective bisection method, I identified that if the vertical shift of the points is 9.71 to two decimal places, then the decay curve best fits the data with the following readout from the calculator: When the value of the vertical shift was 9.70 or 9.72, then the value of r moved away from 1, indicating a worse fit. Obviously, my findings here correspond extremely well with my by eye fit earlier on! Now the above values of a and b have to be converted into the form given for f(x) in equation XX. 6

Setting ab x Bx Ae (2) The initial value at the start of 1900 is found by setting x = 0, and this gives a A (3) Therefore, b x Bx e (4) And since Bx e is B x ( e ), it is rapidly evident that or, alternatively, b B e (5) B lnb (6) Using the value of b above in the screenshot, the value of B that the calculator s regression tool corresponds to is about 0.0181, compared to my visual fit value of approximately 0.0186 (both to three significant figures). Taking the vertical shift into account, the 1900 value becomes a + 9.71 10. 7 to three significant figures. This is the value of A used in the visual fit, so the two models correspond very closely together. Figure 6: Comparing the best regression model from the calculator s built in exponential regression tool with the fit produced earlier in the analysis by eye Figure 6 shows the calculator model superimposed on the model presented before. They are both clearly very close indeed! Now the crunch moment how do the remaining values that were removed from the above regression analysis fit this model are included? How would my analysis stand in hindsight after my predictions made on 14 th June 1991? 7

Figure 7: Comparing the actual records set for the men s 100-metre dash from 1991 with the regression models that were constructed up until 1991. The new plots are included in red in Figure 7. It is clearly evident that the prediction of the exponential decay regression curve is far from desirable! Not only do the new records appear one after the other very quickly, but they follow a curve that in no way can be explained reasonably by the original model. Of course, it is possible to try an fit an exponential decay model to the complete data set, but this is a far from ideal approach to even attempt. A straight line fit (as originally observed) might line up fairly well, and provide a good correlation coefficient, but that model is certainly flawed purely in terms of its long-term predictions. Reflections on the Story So Far While entering the data into spreadsheets and other software, some curious reflections occurred to me which made me worry about the validity of some of the underlying assumptions and analysis approaches I had embarked upon the exploration as givens. The (presumably) increasing number of athletes in the world class population to draw from to increase the chance of exceptionally talented athletes to compete and break records one could identify the entire population of athletes, both past and present, as a single population, each having his best time that could ever be achieved at his optimum level of fitness and performance. Is this not just a normal distribution that is being sampled when a particular athlete goes to break the current record? The impact of a current record broken to pave the way to further record attempts, that is, new record attempts (successful or not) are not random occurrences which are independent of the current record The huge jumps in time intervals as well as the trend for shorter time intervals, and the effects of other world issues such as warfare on record attempts. The effect of one or two individuals on the data trends and therefore the entire modelling process. The role of media and media hype in pushing further records to be broken. 8

The role new training schemes (including special diet) that enhance performance of the human body The use of illegal bodybuilding drugs to change the physique of the human body (while these are banned, it would be ridiculous to claim that every record that has ever been broken has been done cleanly some of the data must have been corrupted in this way, and it only takes one or two data points in this kind of analysis to throw the whole modelling process into disarray) The above reflections highlight many different ways that the exploration could go in order to enhance or refine the model. Indeed, some research into this topic has revealed very advanced statistics and sampling techniques and analyses that have been used to broach similar questions to those raised in this exploration. A Tale of Two Lines Suppose that, rather than randomly distributed along the real time line, the world record breaking events are plotted, evenly spaced, on the horizontal axis. Rather than being treated as bivariate data, we look at the data as a single variable being generated and see if the preceding values indicate what the next value could be. The result of working with the data from this perspective is quite staggering. There is clearly a very sharp kink in the graph that is plotted at the eighth point (Jim Hines record of 9.95 seconds recorded on 14 th October, 1968). Each of these lines can be modelled by regression lines that intersect at this point. The second regression line is considerably less steep than the first, indicating a marked and consistent slowing down of improvement within this event time space rather than real time. Usain Bolt s current record still stands out off the line, but with that single data point removed, the two lines that are produced are very satisfactory. Figure 8: Plotting the world records times of the men s 100-metre dash,equally spaced along the horizontal axis. 9

Figure 9: Fitting two regression lines to the data from Figure 8 in an attempt to model the clear kink in the data set that is evident when the world records are displayed in this way. The final record set by Usain Bolt is omitted to clearly show how well the regression lines follow the data. Figure 10: As Figure 9, but with the final record set by Usain Bolt included. This record suggests one that should have occurred later, after approximately three more records were set. From Figure 10, it is possible to see that Usain Bolt might just have been a remarkable one off who should have scored his current record after two or three more records had been set. Maybe this is where one of those factors or another as-yet-unidentified (but crucial for the sake of shaving off a fraction of a second) mentioned at the start of this exploration could have come to play. Was there a brief tail wind? Could it have been the altitude. Or some traction he gained with his footwear soles and the surface of the running track that gave him the edge that day? Does this suggest that the best way of modelling this data is to create a set of piecewise regression lines? Unfortunately this has not solved the modelling problem. It has instead moved the challenge to new questions: Can the new gradient be predicted, and the (real) point time it occurs? For this there is clearly insufficient data to reach a conclusion. A completely unjustified but suggested future development of records for the men s 100-metre dash is included as a pink regression line in Figure 11. 10

The next stage in this process would be to examine other running events both for men and women. There might also be similar trends or analysis for other events such as swimming, or hurdles. Figure 11: A proposal for how the piecewise linear model may develop for future world records in the men s 100-metre dash event. The pink regression line has no justification whatsoever with any given data. 11