Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling

Size: px
Start display at page:

Download "Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling"

Transcription

1 RESEARCH ARTICLE Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling Melanie F. Pradier 1,2 *, Francisco J. R. Ruiz 1,2,3, Fernando Perez-Cruz 1,2,4 a Signal Theory and Communications Department, University Carlos III in Madrid, Madrid, Spain, 2 Gregorio Marañón Health Research Institute, Madrid, Spain, 3 Data Science Institute, Columbia University, New York, United States of America, 4 Member of the Technical Staff at Bell Labs, Alcatel-Lucent, Murray Hill, New Jersey, United States of America * melanie@tsc.uc3m.es Abstract OPEN ACCESS Citation: F. Pradier M, J. R. Ruiz F, Perez-Cruz F (2016) Prior Design for Dependent Dirichlet Processes: An Application to Marathon Modeling. PLoS ONE 11(1): e doi: /journal. pone Editor: Guy Brock, University of Louisville, UNITED STATES Received: May 18, 2015 Accepted: January 4, 2016 Published: January 28, 2016 Copyright: 2016 F. Pradier et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Data will also be available at the author s personal webpage. Funding: The authors are grateful for financial support from the following institutions: Melanie F. Pradier is supported by the European Union 7th Framework Programme through the Marie Curie Initial Training Network Machine Learning for Personalized Medicine MLPM2012, Grant No ( Francisco J. R. Ruiz is supported by an FPU fellowship from the Spanish Ministry of Education (AP ). This work is This paper presents a novel application of Bayesian nonparametrics (BNP) for marathon data modeling. We make use of two well-known BNP priors, the single-p dependent Dirichlet process and the hierarchical Dirichlet process, in order to address two different problems. First, we study the impact of age, gender and environment on the runners performance. We derive a fair grading method that allows direct comparison of runners regardless of their age and gender. Unlike current grading systems, our approach is based not only on top world records, but on the performances of all runners. The presented methodology for comparison of densities can be adopted in many other applications straightforwardly, providing an interesting perspective to build dependent Dirichlet processes. Second, we analyze the running patterns of the marathoners in time, obtaining information that can be valuable for training purposes. We also show that these running patterns can be used to predict finishing time given intermediate interval measurements. We apply our models to New York City, Boston and London marathons. 1 Introduction Data in the real world typically involves some source of uncertainty. This uncertainty may come from noisy measurements, incomplete information, or from the fact that we only have access to a subset of the data from a larger population. Probabilistic models have proven to be an effective approach for understanding such data, by incorporating our assumptions and prior knowledge of the world. In fact, probabilistic models have become an important tool in all areas of science as a way to develop statistical algorithms that are able to learn hidden structures from the data and make predictions [1]. Within probabilistic models, Bayesian nonparametric (BNP) models present several desirable properties [2]. Their most recognizable benefit is to avoid the need of specifying a closedform model, i.e., a parametric model with a predefined fixed number of hidden variables. PLOS ONE DOI: /journal.pone January 28, /28

2 also partially supported by the Ministerio de Economia of Spain (projects COMONSENS, id. CSD , and ALCIT, id. TEC C03-01), by the Comunidad de Madrid (project CASI-CAM-CM, id. S2013/ICE-2845), and by the Office of Naval Research (ONR N ). Bell Labs Alcatel-Lucent provided support in the form of salaries for author FPC, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of these authors are articulated in the author contributions section. Competing Interests: The authors declare that the author Fernando Perez-Cruz is employed by Bell Labs Alcatel-Lucent. This does not alter the authors adherence to PLOS ONE policies on sharing data and materials. Instead, BNP priors place probability mass on an infinite range of models and let the inference procedure choose the one that best fits the data, in order to provide competitive predictions or density estimations [3]. Although this flexibility makes such models attractive for experts in other fields, obtaining interpretable results might be an even stronger requirement. Most BNP models are described as general priors [4 7] that might not give easy-to-interpret solutions if applied blindly, even if they provide accurate predictions. In order to additionally obtain interpretable results, we should tailor the design of the model and include all our prior knowledge in the assumptions. We need to take problem-specific design choices that will allow for an easy interpretation of the results afterwards. In this way, the first insights of the obtained results should not be foreign to us. This makes the model trustworthy for experts in other fields that do not know about machine learning or statistics, so other conclusions that were not common knowledge can be taken as plausible. At this stage we are able to formulate hypotheses that can be tested with future data and can provide previously unknown insights about the given problem. Such procedure avoids the frequent black-box flavor found in other methods, facilitating collaboration across fields. Examples of such multi-disciplinar efforts can be found in psychiatry [8], genetics [9], biostatistics [10], computer vision [11], econometry [12] or musicology [13]. This paper presents a novel application of BNPs to model marathon runners. We opt for BNP models to leave room for the unexpected. In particular, we make use of the dependent Dirichlet process (DDP) [14], which is a powerful tool that encompasses the Dirichlet process (DP) and the hierarchical Dirichlet process (HDP). However, the DDP is very general and it cannot be directly applied to data without additional constraints. Here, we specify a way to tie the parameters across groups using a Gaussian process [15], thus making the DDP a practical prior for our problem at hand. This paper is an application of the single-p DDP and the HDP to marathon modeling. The novelty of the paper relies not only on the application, but also on the necessary steps to transform a prior that provides accurate estimates into a prior that also gives interpretable results. Non-trivial structural assumptions and design solutions are made to find hidden properties of the athletes while providing accurate predictions. We believe that BNP models will be more useful in the future for experts without machine learning expertise if we can tailor the priors to provide accurate and understandable solutions. The rest of the paper is organized as follows. Section 2 describes the two problems addressed in this paper. In Section 3, we review the BNP priors behind our models. We then introduce the statistical models used in this paper. In particular, Section 4 presents the atom-dependent Dirichlet process (ADDP), a collection of BNP mixture models based on the single-p DDP prior coupled with a Gaussian process, while Section 5 describes an HDP over probability vectors to detect latent running profiles and perform predictions. Section 6 is devoted to the experiments and results. Our conclusions and further discussion can be found in Section 7. 2 Problem Overview 2.1 Age-grading problem We first consider the age-grading problem, in which we want to compare fairly the finishing times of runners having different age and gender. Currently, most popular marathons award entry to participants by their best marathon in the previous 12 months. For example, in Boston it is the only way a participant can gain entry to the race, while other paths are available in New York, Chicago or London. The proposed methodology can be used to equalize entry requirements for different marathons, which vary considerably for one event to the next, as there is no PLOS ONE DOI: /journal.pone January 28, /28

3 widely accepted standard method to specify them. Furthermore, the World Master Athletics (WMA) has an age-grading system [16] for equalizing the finishing time according to age and sex. They lobby for this measure to be taken into consideration for selecting the winners of each race, even though that procedure is based on world records, i.e., outliers that may not be very representative, or even realistic, for most races. Our method also provides an alternative way to reward runners of different ages and gender. Our approach consists in adapting a single-p DDP [5] to cluster the finishing time for each runner according to his/her age and sex. We propose a Gaussian process to control how the clusters (representing marathon finishing time) change from one group to the next (different ages or gender). We find that the means of these clusters are directly comparable to the marathon entry requirements and the age-graded tables from the WMA. Additionally, direct comparisons for any finishing time are straightforward, since we find a full distribution for all ages and both genders. Our single-p DDP can simultaneously deal with different races and/or the same race on different years, providing a unified ranking for all the races that may differ on elevation profile, temperature or humidity. 2.2 Running pattern problem The second problem considered herein is the temporal analysis of running patterns. Our objective here is twofold. First, we are interested in a tool that can capture latent running profiles that reflect the marathon difficulties along the 26.2 miles ( km). This can be useful for athletes training purposes. Second, we aim at predicting the arrival time of runners using intermediate records. This problem has already been addressed in [17], where the finishing times are imputed for the 2013 Boston marathon. One of the best approaches rely on the 100 nearest neighbors, which has the limitation of clustering runners that are doing the same absolute times. We propose to use an HDP [6] to model the fraction of time each runner has spent at each intermediate interval (typically, measures are taken every 5 km and at half-marathon). In this way, we cluster the time ratio instead of the absolute times. Runners that run at different but constant speed will be in the same cluster, no matter if they run each mile in five, eight or eleven minutes. Thus, this model allows estimating finishing times for slower runners that have the same time-ratio profile than fast ones. We use an HDP model in which the likelihood function is a Dirichlet distribution, and each DP clusters the runners by age group and sex. When modeling the full race, it helps to understand the different trade-offs and which parts of the race are harder. 3 Dependent Dirichlet processes In this Section, we briefly review the stochastic processes on which our models are based. In particular, we focus on the DDP and two of its variants. The DDP is a generalization of the DP for groups of data [5]. The DP is a stochastic process whose realizations are random infinite discrete probability distributions [18]. A DP is completely specified by a base distribution H, which is the expected value of the process, and a positive real number α (usually referred to as concentration parameter), which plays the role of an inverse variance. In general, a draw G 0 * DP(α, H) from a DP can be expressed as G 0 ¼ X1 k¼1 p k d k ; ð1þ where the vector π contains the atom weights, and ϕ k are the atom locations defined in the parameter space. In this form, it can be shown that the atom locations ϕ k are independent and PLOS ONE DOI: /journal.pone January 28, /28

4 identically distributed as ϕ k * H, whilst the weights (also known as stick proportions) are distributed as π * GEM(α), a distribution named after Griffiths, Engen and McCloskey [19, 20]. Such representation is called stick-breaking representation [21]. Note that Eq (1) defines an infinite mixture model, i.e., a mixture model with a countably infinite number of clusters. However, since the weights π k decrease exponentially fast, only a finite number of clusters are used to describe any finite dataset [3]. In fact, the expected number of components grows logarithmically with the number of observations [6]. In the DP mixture model, the actual number of clusters describing the data is not fixed, and can be automatically inferred from the data using the usual Bayesian posterior inference framework, e.g., Markov chain Monte Carlo [22] or variational [23] methods. The Chinese restaurant process (CRP) is typically found in the literature of DPs, a standard culinary metaphor that vividly illustrates how the DP operates [6]. In this metaphor, the atom locations ϕ k are referred to as dishes in a restaurant, and observations that are clustered together are viewed as customers sitting on the same table, therefore eating from the same dish. In this generative process, customers enter the restaurant one at a time, and they can either sit on an existing table, with probability proportional to the number of previous customers that are already sitting on that table, or open a new table, with probability proportional to α. In the latter case, they also sample a new dish from the prior, i.e., ϕ k new * H. In its more general form, the DDP is a stochastic process that generalizes the DP and that can be applied for clustering groups of data [5]. For each group j, we have an infinite mixture model of the form G j ¼ X1 k¼1 p jk d jk ; ð2þ where G l is a group-specific random measure, and the atom weights π jk and atom locations ϕ jk follow stochastic processes on the covariate space j. We describe below the two particularizations of the DDP that we use throughout the paper: the hierarchical DP (HDP) and the singlep DDP. Fig 1 shows a comparative sketch between both models. 3.1 Hierarchical Dirichlet process The HDP is a particular DDP to cluster groups of data sharing mixture components [6]. It uses a DP for each group of data, with the DPs for all groups sharing a base distribution which is Fig 1. Comparison of two Dependent Dirichlet processes. Hierarchical DP is at the left, single-p DDP is at the right. The first one shares atom locations, while the second one shares mixture weights. doi: /journal.pone g001 PLOS ONE DOI: /journal.pone January 28, /28

5 itself drawn from a DP. Mathematically, we first draw a base distribution from a DP as G 0 * DP(γ, H), where G 0 ¼ P 1 k¼1 v kd k, and for each group j we draw a distribution from a DP using G 0 as the base distribution, i.e., G j * DP(α, G 0 ). This construction ensures that all the distributions G j share the atom locations given by G 0, since G 0 is itself a discrete probability distribution. Since each atom corresponds to a cluster, cluster parameters ϕ k are shared across all groups. Each G j admits a representation as G j ¼ X1 k¼1 p jk d k ; ð3þ where the atom locations ϕ k do not depend on the group j. Furthermore, this method allows groups to share statistical strength via the atom weights v k of the base distribution G 0. Indeed, the vector of weights for each group j can be obtained as π j * DP(α, v). In the corresponding culinary metaphor, the HDP can be explained with a Chinese restaurant franchise (CRF), in which there is a collection of restaurants, and dishes are shared across restaurants. However, the popularity of each dish, i.e., the corresponding atom weight, is different in each of the restaurants [6]. 3.2 Single-p dependent Dirichlet process The single-p DDP is another DDP, which works in a complementary fashion to the HDP. In this case, atom weights are shared across groups while atom locations are allowed to vary across groups. In terms of the often used culinary metaphor for DPs [2], the HDP shares the dishes across restaurants but allows a different dish popularity in each of the restaurants, while the single-p DDP shares the dish popularity across restaurants but allows the dishes to vary slightly, in order to better fit each group of customers. The latter would be a peculiar CRF in which the popularity of tables is matched one-to-one across restaurants, with the served dish in each linked table slightly customized in each restaurant (e.g., different ingredients, cooking time or local taste). In the single-p DDP, the latent measure for each group j can be expressed as G j ¼ X1 k¼1 p k d jk ; ð4þ where the vector π * GEM(α) contains the mixture weights and ϕ jk are the atom locations. The single-p DDP does not specify how to tie the atom locations ϕ jk across groups for each k. This step is critical, as it conditions the performance of the model. We put forward in Section 4 another stochastic process for this purpose. 4 Modeling of the finishing time 4.1 Atom-dependent Dirichlet process mixture model This section describes a BNP model based on the single-p DDP prior that allows comparing the shape of different distributions while keeping the corresponding quantiles fixed. In the context of the marathon, we use it to obtain a fair comparison between runners regardless of their age or gender. Runners are grouped together according to their age and gender, yielding J different groups. In our infinite mixture model, we cluster the runners of all groups with a potentially unbounded number of clusters. Each cluster k presents a fixed percentage of runners given by π k, with a stochastic process linking the atom locations, i.e., the mean finishing time. This construction has the potential to provide a direct comparison for the finishing time in each group j. However, as the single-p DDP is a very general prior, we need to define the PLOS ONE DOI: /journal.pone January 28, /28

6 likelihood and the stochastic process in a way that is insightful about the marathoner s finishing time. We denote each marathoner finishing time as x ji, where j =1,..., J indexes the group and i runs over marathoners, and we assume a Gaussian likelihood for the finishing time x ji : x ji jc ji ¼ k; m k ; y j ; s 2 x N x ji jm k þ y j ; s 2 x : ð5þ Here, μ k denotes the global mean for cluster k, θ j is the shift associated to group j, c ji represents the cluster assignment associated to observation x ji, and s 2 x is the variance of the Gaussian distributions. Hence, we use a cluster-specific parameter μ k to describe cluster k, but we allow deviations from this value due to age or gender (this effect is modeled by θ j ). The key aspect that makes the single-p DDP useful for comparing different age-gender groups is the stochastic process that governs θ j. We would like this value to vary smoothly across ages, and therefore we choose a zero-mean Gaussian process prior for it: θ N ð0; Σ y Þ; ð6þ where θ =[θ 1,..., θ J ] >, and for the covariance function we use the standard choice, i.e., the squared exponential kernel given by 2 ðσ y Þ j ¼ s 2 exp ð jþ y þ kdð jþ; ð7þ 2n 2 where l and j represent two different age groups, s 2 y accounts for the variance, ν controls the degree of correlation between age groups l and j, and κ is a jitter factor to avoid numerical instabilities. We use an independent Gaussian process for each gender. There are other alternatives, see [15] for a comprehensive introduction for valid covariance functions. In our case, the squared exponential kernel is a smooth kernel (infinitely differentiable) that captures the correlation between the different age groups. Finally, we place the following priors over the assignment variables c ji and the cluster weights π: and c ji jπ π; ð8þ πja GEMðaÞ; ð9þ We refer to this single-p DDP prior, together with the likelihood model in Eq (5) and the Gaussian process for θ j, as the atom-dependent Dirichlet process (ADDP) mixture model. This model is similar to the ANOVA-DDP prior in [24]. We rely on a Gaussian process to control the dependency between the shift variables θ j, while the ANOVA-DDP prior relies on independent priors. This allows for smooth variations of the cluster means with the age. A graphical representation for the ADDP is depicted in Fig 2. We place a Gaussian prior over the cluster means μ k and an inverse gamma prior over the variance s 2, i.e., x mk N m0 ; s2 0 ; ð10þ s 2 x IGða; bþ; ð11þ where μ 0, s 2, a and b are hyperparameters of the model. The value of 0 s2 0 is assumed to be much PLOS ONE DOI: /journal.pone January 28, /28

7 Fig 2. Graphical representation of the basic ADDP mixture model. Grey circles represent observed variables, white circles are hidden random variables. Plates refer to duplicated random variables. doi: /journal.pone g002 larger than s 2 y, so that the first one controls the overall finishing time for the clusters (hours), whilst s 2 y controls the differences between groups due to different ages (minutes). We assume a common variance s 2 x for all the clusters, because it provides an ordering of the clusters, which is necessary for comparing the finishing times. Allowing for different variances for each component should provide a more accurate (in the sense of better density estimation) description of the finishing times, but a less interpretable and less actionable representation, as runners assigned to Gaussian components with different variances are not directly comparable. We have not placed a joint prior for the cluster means and variances through the normal-inverse gamma distribution, since separate priors might have better properties for density estimation [25] and allow for faster Gibbs sampling inference. 4.2 Model extensions Multiple races. We could apply the previous models to finishing times from different races or years, but we might obtain unexpected results since different races can present unalike conditions due to temperature, elevation profile, humidity, pull of runners, etc. In this section, we present an useful extension that deals with different races all together and allows drawing comparisons between these races. In order to deal with this, we extend the basic ADDP model using varying weights across races. This leads to a hierarchical ADDP model, in which cluster weights are allowed to change across races, and cluster parameters are allowed to change across age-gender groups. We refer to this model as the hierarchical ADDP (H-ADDP) model. Fig 3 shows a graphical representation of this extended model, with the following likelihood and PLOS ONE DOI: /journal.pone January 28, /28

8 Fig 3. Graphical representation of the H-ADDP mixture model. Cluster weights change across races, whereas cluster means change across age-gender groups. doi: /journal.pone g003 priors: x rji jc rji ¼ k; m k ; y j ; s 2 x N x rji jm k þ y j ; s 2 x ; m k N m 0 ; s 2 0 ; θ N ð0; Σ y Þ; s 2 x IGða; bþ; c rji jπ r π r ; π r jv; a DPða; vþ; vjg GEMðgÞ; ð12þ where r indexes the different races, γ is the upper level concentration parameter, π r ¼ðp rk Þ 1 k¼1 are the mixture weights for race r, v ¼ðv k Þ 1 k¼1 are the global weights for all races. One simple way to interpret this model is by conditioning on a particular race or an agegender group. If we only have data from a single race, we recover our original ADDP model. If we only have data from a single age-gender group, we recover an HDP, i.e., the cluster components are shared, but the mixing proportions are different. Age-gender interaction. The basic ADDP mixture model considers male and female runners independently, assuming independent shift delays θ j between both genders j (i.e., we use a PLOS ONE DOI: /journal.pone January 28, /28

9 block-diagonal covariance matrix). A natural extension of the model consists in introducing an additional gender factor δ and some age-gender interaction factors ω j in order to capture the correlation between male and female athletes. In such model, the shift delays θ j are shared for both men and women, and j indexes different age groups instead of age-gender groups. We refer to this model as the age-gender interaction ADDP model. The generative model can be written as follows: x ji jc ji ¼ k; g ji ; m k ; y j ; s 2 x ; d; o j N x ji jm k þ y j þ g ji ðd þ o j Þ; s 2 x ; m k N m 0 ; s 2 0 ; θ N ð0; Σ y Þ; s 2 x IGða; bþ; c ji jπ π; πja GEMðaÞ; d N 0; s 2 g ; ð13þ ω N ð0; Σ o Þ; where the indicator variables g ji differentiate male (g ji = 0) and female (g ji = 1) runners, δ is the gender effect, and ω =[ω 1,..., ω J ] > contains the age-gender interaction factors influencing the likelihood of female runners. Additionally, s 2 and S g ω are hyperparameters of the model. Cluster-dependent shifts. We now present another extension of the model concerning the shifts θ. In the model described above, the delay θ j only depends on the age and gender, which implies that the shift is the same for all clusters k, no matter whether they are fast or slow runners. However, we can also consider cluster-dependent shifts. This allows us to capture different shift evolutions across age/gender depending on the speed of runners. That is, instead of having a single delay θ j for each group, we consider a different delay θ jk for each cluster and group. Each vector θ k =[θ 1k,..., θ Jk ] > follows its own Gaussian process with mean μ θ and covariance matrix S θ : θ k N ðμ y ; Σ y Þ: ð14þ Table 1 shows a summary of all the ADDP model proposed in this paper. The main difference between them is the way in which the atom locations vary across the covariate space, i.e., the age-gender space. In the case of the H-ADDP model, atom positions depend on the age and gender of runners, and weights vary as a function of the race. Table 1. Summary of the ADDP mixture models developed in this paper. Model Atom weights Atom locations Basic ADDP π k μ k + θ j H-ADDP π rk μ k + θ j Age-gender interaction ADDP π k μ k + θ j + g ji (δ + ω j ) Cluster-dependent ADDP π k μ k + θ jk doi: /journal.pone t001 PLOS ONE DOI: /journal.pone January 28, /28

10 5 Modeling of running patterns In this section, we propose an HDP model as a tool that complements the analysis of the previously introduced ADDP. In particular, we address the analysis of temporal evolution of runners during the race, in order to understand how the marathoners pace themselves to complete the marathon. We aim at discovering running patterns, i.e., distinguishing those overly optimistic runners, with decreasing speed along the race, from well-trained runners who tend to keep a constant speed. It also helps to understand where the marathons are harder, so that runners can know beforehand. The hidden running patterns can be used for training purposes, as they can find out the typical shortcomings of athletes with respect to their age, which may help runners train and run more intelligently. In addition, we also show that discovering running patterns provides a new tool for prediction of finishing times, with results comparable to the best reported method in [17]. The idea is to cluster the data according to the relative time spent in each interval regardless of each runner s total time. In this sense, we are no longer interested in the absolute times, but in the time proportions invested for each interval. Marathons tend to record the elapsed time every 5 kilometers, in addition to half and full-marathon times. Our input data in this case consists of an N D dimensional matrix X with the time spent for each interval, together with the age and gender. Here, N is the number of runners and D denotes the number of available time records. We normalize our input data so that each runner is represented as a vector containing the fraction of time spent for each intermediate interval. We split the data X into J groups of runners having the same gender and belonging to the same age group. Here, we do not use the actual age of the runners, but split them according to age groups (i.e., age ranks) instead. We use the HDP [6] to cluster the running patterns for the different groups. In the HDP, clusters are allowed to show different probabilities for each group, but the per-cluster parameters are shared across groups. As explained in Section 3, we first draw a global base distribution from a DP as G 0 * DP(γ, H), where G 0 ¼ P 1 k¼1 v kd k, and for each group j we draw a distribution from a DP using G 0 as base distribution, i.e., G j * DP(α, G 0 ). In our model, the likelihood function is a Dirichlet distribution and can be written as x ji jc ji ¼ k; p k Dirichletðtp k1 ;...; tp kd Þ; ð15þ where x ji is the normalized D-dimensional vector for runner i in group j, c ji represents its cluster assignment, p k =[p k1,..., p kd ] is the vector of patterns representing cluster k, and τ is the concentration hyperparameter of the model. We place a Dirichlet prior over the per-cluster vectors p k, p k Dirichletð 1 ;...; D Þ; ð16þ where l d is the length of interval d, and is its concentration hyperparameter. 6 Experiments In the following experiments, we apply the described models to the New York City marathon [26] for 6 different years, between 2006 and This database consists of 249,899 runners in total. We additionally compare the NYC marathon data to the marathons of Boston [27] and London [28] for 2010 and 2011, including 117,255 additional runners. Data is included in S1 Marathon Database File. In order to test the resulting models, we set aside a test set with 20% of the participants for each race and age/gender group, ensuring that the age and gender proportions are the same in both train and test sets. PLOS ONE DOI: /journal.pone January 28, / 28

11 Inference and parameter estimation. Posterior inference for all simulations is based on Gibbs sampling. Following Algorithm 8 in [29], we do not integrate out the hidden variables, and we propose 10 new clusters at each iteration. Simulations are run in MatLab, with an approximated running time of 2.3 seconds per iteration for the plain ADDP model using an Intel Core i7-4700mq. In our results in Section 6.1, we report the values of the hidden variables (means and shift delays) averaged for the last 10,000 iterations after running the sampler for 50,000 iterations. In Section 6.2, we run 10,000 iterations of the sampler and average the results for the last 2,000 iterations. For the per-cluster variables, we carry out the averaging procedure to account for potential label switching. Label switching is common in Bayesian mixture models, and it occurs because the likelihood is invariant to permutation of the cluster labels [30, 31]. Such problem might occur in our case, but it is easy to detect by looking at the traceplot of the cluster-specific parameters, and hence parameters across MCMC samples can easily be matched and averaged. Moreover, in our plots we only report the value of the most populous clusters after filtering out the spurious ones with few runners. These spurious clusters appear as a consequence of the nonparametric nature of the model and the inference procedure, and can also be easily detected by looking at the traceplot of the cluster means. Without more data we cannot actually tell whether there is an actual phenomenon in them or they have locked in a noisy pattern. Choice of hyperparameters. In the case of the basic ADDP mixture model in Section 4, we use: s 2 ¼ 0:05, ν = 10, κ y =10 6, a =1,b =1,μ 0 = 5 and s 2 ¼ 1. The value of μ 0 0 and σ 0 are set so that 2-hour marathoners are within 3 standard deviations and 9-hour marathoners (typical cut-off time) are not unheard of. The ratio between σ θ and σ 0 is approximately 1 to 4, so the former is of the order of 15 minutes, while the latter is of the order of one hour. Therefore, σ 0 controls the overall finishing time, while σ θ controls the differences among age groups. The values of a and b control the variance of each Gaussian component and are set so that values of less than an hour, but not too small, are more likely. ν was defined in Eq (7) and controls the correlation between the mean finishing time of runners having same gender and different ages, so that they do not deviate significantly. The value of κ measures the error in the recorded time and in this case it prevents numerical instabilities. In the age-gender interaction ADDP model, we choose the same hyperparameters as in the basic ADDP model, and the additional hyperparameters are set as s 2 ¼ 0:05 (which corresponds to the same prior variance as for θ g j) and S o ¼ 1 S 2 y. For the HDP model in Section 5, we set = 0.2 km 1 and τ = 5000, because the differences between the relative time spent at each 5-km interval are typically small. Since τ plays the role of an inverse variance, the larger its value, the more clusters we should expect. We have found empirically that values of τ ranging in the thousands (between 2500 and 10000) provide results that admit similar interpretations, and they only differ in the granularity of the clusters. Finally, we place a Gamma prior with shape 1 and scale 10 over the concentration parameters α and γ, and sample their values following [32]. These hyperpriors are chosen in order to avoid the creation of too many spurious clusters. Due to the huge amount of data, results are not very sensitive to hyperparameter values, as long as they are not set to completely misleading and unrealistic values. In our application at hand, we can actually set (and be able to explain) the values of the hyperparameters using all our prior knowledge, as detailed above, which is important in order to incorporate the known information and allow for the expected variances. 6.1 Modeling of the finishing time Density estimation. We first show the performance of our basic ADDP mixture model in terms of density estimation, compared to a standard HDP with Gaussian likelihood, see [6] for PLOS ONE DOI: /journal.pone January 28, / 28

12 details. We depict in Fig 4 the empirical histogram for a particular group of runners (we choose all forty year-old male finishers for the plot), together with the overall density estimates, using data for the six considered NYC marathons. We obtain σ x = 5.8 minutes. Both the ADDP and HDP model perform similarly in terms of density estimation. Note that the empirical histogram in Fig 4 presents one narrow peak that is not fully captured by the ADDP nor HDP model. This peak, just under 4 hours, and the valley right afterwards are due to some runners trying to finish (and succeeding) a sub-4-hour marathon. This is a psychological effect that has limited interest for us, since it is not indicative of runners inherent performance. Using cluster-specific values for the variance s 2 x would yield better density estimation, even capturing this peak, but it would fail to provide any comparison between distributions. Here, we are interested in ordering runners into clusters for comparison, which is achieved by having a shared value of s 2 x for all clusters. Impact of age and gender. We now use the age-gender interaction ADDP model described in Subsection 4.2. In addition to its density estimation capacity, the ADDP has an additional descriptive strength, since it can show the impact of age and gender on runners performance straightforwardly through inference of the age delays θ j, gender factor δ and age-gender interaction factors ω j. Fig 5 shows the average proportion of runners in each cluster, as well as the inferred cluster means μ k, for both the ADDP and a standard HDP with Gaussian Fig 4. Density estimation capacities for the basic ADDP model. The histogram corresponds to the population of forty-year-old male runners, which is the largest age-gender group. The red curves are the probability density functions inferred by the basic ADDP model. The blue dotted lines represent the inferred individual clusters. doi: /journal.pone g004 PLOS ONE DOI: /journal.pone January 28, / 28

13 Fig 5. Age proportions per cluster. The HDP is at the top and the age-gender interaction ADDP is at the bottom. Male and female runners are shown together. The colors design the age of the runners, from 18-year-old in blue to 69-year-old in dark red. For the ADDP model, the cluster labels correspond to μ k + θ j, with the value of θ j corresponding to the 28-year-old male runners, the shifts for other age or gender can be found in Table 2. doi: /journal.pone g005 likelihood [6]. Runners aged above 69 are not reported because there are too few of them. The HDP results are not easy to interpret, except for the first three clusters that show the time degradation with age, because we do not know the cumulative percentage as the finishing time is increased. The ADDP model is much easier to understand. The first cluster contains the Olympic quality runners for all ages, if Olympics were held for each age group (less than 1% of the runners). The second cluster has the competitive runners (about 13% of the runners), the third cluster has the standard marathoners (about 33% of all runners), and so on. The x-axis in Fig 5 provides the value of μ k + θ j for 28 year-old male runners and the degradation for other ages and sex is shown in Table 2. Using the plot and the table, we can know the proportion of runners in each group and how much extra time they need compare to the fastest group. Fig 6 shows the value of the inferred cluster means μ k plus and minus one standard deviation, shifted according to θ j, δ, and ω j for both men and women. We only depict the two fastest clusters, and compare the corresponding values of the finishing time with the entry requirements of different marathons and the WMA records. Best performance for females and males is predicted, respectively, at 26 and 28 years old, which is consistent with [33]. The plateau PLOS ONE DOI: /journal.pone January 28, / 28

14 Table 2. Averaged values of θ j for men (or θ j + δ + ω j for women) for all age groups in minutes. The values have been shifted so that θ 28 = 0 for men. This table is useful to directly compare the finishing time of two runners of any age and gender. We only have to calculate the difference of their respective cells in the table and we will get the time penalty that should be considered (in minutes). age men women age men women age men women age men women age men women age men women doi: /journal.pone t002 afterwards illustrates a stable period of performance between the 30 s and 40 s for both genders. All plots behave in a similar way. This is what we meant in the introduction by that the first insights should not be foreign to us, so experts in marathon modeling can take other conclusions as plausible. Now, we focus on what is different. The most striking difference is how the entry levels penalize younger male runners, specially runners under 25. To be fairer to the youngest runners, their entry time should be raised (in about 7 minutes compared to the 30 years-old). The Boston marathon entry level is perfectly aligned with our second cluster for 40+ years old men and almost perfectly match the female second cluster, except for the runners under 23. There is a penalty of about 4 7 minutes for runners aged and between 7 14 minutes for year-old runners. The entry times slightly favor the year-old runners. The London marathon also penalizes excessively runners in their fifties compared to those in their forties and sixties, which seems odd. It is also clear that over 50 (or even 45), the degradation of the finishing times per year is significant enough to merit a finer scale to guarantee entry times (this may also apply for years-old). For example, a runner of 60 years old is doing almost 15 minutes less than a runner of 64 (which is a very long time in any marathon). Finally, the WMA curve for men penalizes the older male runners, while for older female runners it seems to have a similar trend than the first cluster of the ADDP model. For younger runners, the difference between the typical women in the olympic cluster and the WMA is larger than the difference between the typical men in the olympic cluster and the WMA. Fig 7 shows the finishing time gap between women and men. The gap seems to be of about 30 minutes and slightly increasing with age. There are too few runners over 65 for the final decay to be statistically significant. We can come up with two different plausible explanations, but we do not have data to confirm whether this empirical effect is due to any of them or to some other unknown factor. The degradation with age between women and men might be due to physiological factors, i.e., women age differently than men for long distance running, or it PLOS ONE DOI: /journal.pone January 28, / 28

15 Fig 6. Inferred cluster means and entry requirements. Comparison of the inferred cluster means, i.e., μ k + θ j with k 2 {1,2}, with the entry requirements (runners below the curve can qualify) for New York City, Boston and London marathons. (Top) Men. (Bottom) Women. doi: /journal.pone g006 PLOS ONE DOI: /journal.pone January 28, / 28

16 Fig 7. Gender effect on the final performance. Averaged inferred value for the gender coefficients δ + ω j. doi: /journal.pone g007 might be due to socio-economical factors, i.e., women above 30 cannot train as much as men do. Comparison across multiple races. We compare the NYC marathons to the ones in Boston and London, using the H-ADDP model described in Section 4.2. We consider both the 2010 and 2011 marathons, and we split the runners into age groups instead of using their actual age because we do not have this data available for London marathon. Fig 8 shows the inferred values of the per-race weights π rk. The values for μ k +θ j in the x-axis are those of the male runners and the value of σ x = 19 minutes. First, we notice that the values of π rk are quite different for each place, but they show little variation between different years. We can argue that this pattern is mainly due to the race difficulty, assuming a stationary selection of the runners. Boston has the most striking pattern, which can easily be explained by the strict entry requirement time. For years old the entry time is 3h25m and the cluster with 70% of the runners has a mean of 3h19m20s. There is a group of almost 15% of the runners that finish just under 4 hours and about 15% of the runners that do much worse than their qualifying time. This might be due to poor training or having some issue during the race. For the Boston marathons, there are no runners in the 3h cluster and around 1% runners in the fastest cluster. The void in the 3h group is due to the massive proportion of runners in the 3h19m group, which makes any runner in that group to be represented by the 3h19m cluster. The runners under 3h PLOS ONE DOI: /journal.pone January 28, / 28

17 Fig 8. Mixture weights for the H-ADDP mixture model. The figure shows the mixture weights π rk for each race r and cluster k. The legend shows the different races, and the x-axis corresponds to different clusters. doi: /journal.pone g008 are the runners that do much better than the needed qualifying time and cannot be represented by the massive 3h19m group. The proportions in NYC and London are more similar to each other, as both marathons allow runners to enter the race in more ways than just by entry requirement. Being these two races more accessible or democratic, we can consider the proportions in the different clusters closer to the general population of marathon runners. The 3h group is more populous in London than NYC, but the 3h19m and 3h53m clusters contain a larger proportion of NYC runners. The 4h30m group is equally probable in both races. London seems to attract a higher proportion of slower runners (over 5 hours). This difference might be due to the difficulty of the marathons (profile and weather conditions) or the pull of runners. NYC race is more hilly than London, which can explain the difference in the first cluster, but the runners in NYC are more diverse (coming from different parts of the country and world), while London attracts more local runners. This might also explain the pattern of the slowest athletes. Comparison taking into account the speed. We now apply the model extension in Section 4.2 with cluster-dependent shift delays θ jk. Figs 9 and 10 show the inferred cluster means for men and women respectively. Although the overall shape of the curves is quite similar across clusters, which validates our previous conclusions, there are some noticeable differences for the fastest runners and we concentrate on those. PLOS ONE DOI: /journal.pone January 28, / 28

18 Fig 9. Inferred cluster means μ k + θ jk for men. We have used the extended model with speed-dependent clusters. The legend shows the inferred value of the proportions π k for each cluster. doi: /journal.pone g009 The most interesting difference is the behavior of Clusters 1 and 2. At a first glance, it seems to the naked eye that women are faster than men. Cluster 1 captures the Olympic female runners that are doing under 2h45m, and its proportion is very low (less than 1%). Cluster 2 for women covers those runners doing under 3h45m and it represents 13% of the female population. There is a significant difference between Olympic female runners and competitive female runners, so two clusters are needed. For men under 50, Cluster 2 represents 13% of the runners and it captures those doing under 3h30. The model considers that the Olympic runners can be modeled by the tail of the distribution of competitive runners, i.e., Cluster 2, without requiring a new cluster as the Olympic women need. Male runners over 50 behave as women do, and two clusters are needed to separate the Olympic and the competitive groups. For men under 50, Cluster 1 sits in between the two populous clusters, becoming irrelevant in terms of density estimation. It appears because the model forces the same proportion for all clusters across age and gender groups. In order to support this conclusion we have depicted the histogram of the 28-year-old runners together with the inferred density in logarithm scale in Figs 11 and 12. In those plots, we can see that the Olympic male runners just doing over 2 hours can be modeled by the same cluster as the competitive male runners, while the finishing time for the Olympic female runners could not be explained by the competitive female ones, PLOS ONE DOI: /journal.pone January 28, / 28

19 Fig 10. Inferred cluster means μ k + θ jk for women. We have used the extended model with speed-dependent clusters. The legend shows the inferred value of the proportions π k for each cluster. doi: /journal.pone g010 and hence a specific cluster is needed. In previous sections, θ j was not allowed to vary with k. The cluster for the Olympic males was then visible, but this is an effect of forcing the same value of θ j for all clusters, since women and older male runners need such cluster. This is the only significant difference when we replace θ j with θ jk. In short, Cluster 1 (with low weight) is describing a different local feature for men under 50 than for men over 50 and women of all ages, and the Olympic male runners below 50 are found in the tail of Cluster 2 instead of Cluster 1 as in the other age/gender groups. There may be several explanations for this effect. Here we present some hypothesis that should be confirmed with further studies. First, in the NYC marathon the Olympic women run by themselves in an early wave, while the Olympic men start at the same time as everyone else, so competitive men can try to follow them. However, this does not explain the need for a cluster for fast male runners over 50. Second, we can also hypothesize that female Olympic runners have a training that is significantly different from competitive female runners, while for male runners there is a continuum in the training between Olympic and competitive runners. This could also apply for male over 50, in which there are not that many doing Olympic finishing times and competitive runners are not as strong. Finally, another plausible hypothesis is that younger male runners are more risky than female and older male runners. Those that succeed PLOS ONE DOI: /journal.pone January 28, / 28

20 Fig 11. Density estimation for 28-year-old male runners. We have used the cluster-dependent ADDP model, where cluster positions are speeddependent. The blue solid line represents the inferred distribution and the green dash-dotted line is the normalized histogram with 6-minute bins. Black and red lines correspond to the individual Gaussian components that define the inferred density, weighted by their averaged weight. doi: /journal.pone g011 do a better time and close the gap between the Olympic and competitive runners. Older males and female competitive runners do not follow such a risky approach and therefore they do not close the gap with the Olympic runners. There is some evidence on this risky hypothesis in Section 6.2, in which we see that the clusters with reckless running patterns are mainly populated by younger males. 6.2 Modeling latent running patterns Here, we consider temporal sequences of time measurements every 5 km, and at half and full marathon, as explained in Section 5. Running patterns. In this section, we use the data from NYC marathons, with 194,778 runners. We discarded data from the 2006 marathon because we observed that intermediate measurements were not fully synchronized with the half and full marathon times. After applying our HDP model, we found a modal value of 46 clusters. In Fig 13 we show the twelve most populated clusters, which account for around 90% of the population on average. The remaining clusters do not behave significantly different than the ones we show in this section. For clarity, we do not directly plot the time proportion spent at each interval, but instead PLOS ONE DOI: /journal.pone January 28, / 28

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Opleiding Informatica

Opleiding Informatica Opleiding Informatica Determining Good Tactics for a Football Game using Raw Positional Data Davey Verhoef Supervisors: Arno Knobbe Rens Meerhoff BACHELOR THESIS Leiden Institute of Advanced Computer Science

More information

PGA Tour Scores as a Gaussian Random Variable

PGA Tour Scores as a Gaussian Random Variable PGA Tour Scores as a Gaussian Random Variable Robert D. Grober Departments of Applied Physics and Physics Yale University, New Haven, CT 06520 Abstract In this paper it is demonstrated that the scoring

More information

A Novel Approach to Predicting the Results of NBA Matches

A Novel Approach to Predicting the Results of NBA Matches A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

Marathon Performance Prediction of Amateur Runners based on Training Session Data

Marathon Performance Prediction of Amateur Runners based on Training Session Data Marathon Performance Prediction of Amateur Runners based on Training Session Data Daniel Ruiz-Mayo, Estrella Pulido, and Gonzalo Martínez-Muñoz Escuela Politécnica Superior, Universidad Autónoma de Madrid,

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania Prepared for the: Pennsylvania Fish and Boat Commission Produced by: PO Box 6435 Fernandina Beach, FL 32035 Tel (904)

More information

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON Dorit Hammerling 1, Matthew Cefalu 2, Jessi Cisewski 3, Francesca Dominici 2, Giovanni Parmigiani 2,4, Charles Paulson 5, Richard Smith 1,6 1 Statistical

More information

b

b Empirically Derived Breaking Strengths for Basket Hitches and Wrap Three Pull Two Webbing Anchors Thomas Evans a and Aaron Stavens b a Montana State University, Department of Earth Sciences, PO Box 173480,

More information

The Intrinsic Value of a Batted Ball Technical Details

The Intrinsic Value of a Batted Ball Technical Details The Intrinsic Value of a Batted Ball Technical Details Glenn Healey, EECS Department University of California, Irvine, CA 9617 Given a set of observed batted balls and their outcomes, we develop a method

More information

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms * Queue analysis for the toll station of the Öresund fixed link Pontus Matstoms * Abstract A new simulation model for queue and capacity analysis of a toll station is presented. The model and its software

More information

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf Online Companion to Using Simulation to Help Manage the Pace of Play in Golf MoonSoo Choi Industrial Engineering and Operations Research, Columbia University, New York, NY, USA {moonsoo.choi@columbia.edu}

More information

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS KELLIN RUMSEY Introduction The 2016 National Basketball Association championship featured two of the leagues biggest names. The Golden State Warriors Stephen

More information

Volume 37, Issue 3. Elite marathon runners: do East Africans utilize different strategies than the rest of the world?

Volume 37, Issue 3. Elite marathon runners: do East Africans utilize different strategies than the rest of the world? Volume 37, Issue 3 Elite marathon runners: do s utilize different strategies than the rest of the world? Jamie Emerson Salisbury University Brian Hill Salisbury University Abstract This paper investigates

More information

LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/12

LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/12 LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/ This experiment will introduce you to the kinetic properties of low-pressure gases. You will make observations on the rates with which selected

More information

b

b Empirically Derived Breaking Strengths for Basket Hitches and Wrap Three Pull Two Webbing Anchors Thomas Evans a and Aaron Stavens b a Montana State University, Department of Earth Sciences, PO Box 173480,

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 12 January 2009 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 12 January 2009 1 / 28 Pattern Recognition By Example Example:

More information

A Hare-Lynx Simulation Model

A Hare-Lynx Simulation Model 1 A Hare- Simulation Model What happens to the numbers of hares and lynx when the core of the system is like this? Hares O Balance? S H_Births Hares H_Fertility Area KillsPerHead Fertility Births Figure

More information

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings Michelle Yeh and Jordan Multer United States Department of Transportation Volpe National Transportation

More information

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Outline: MMSE estimation, Linear MMSE (LMMSE) estimation, Geometric formulation of LMMSE estimation and orthogonality principle. Reading:

More information

Completing the Results of the 2013 Boston Marathon

Completing the Results of the 2013 Boston Marathon Completing the Results of the 2013 Boston Marathon The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Published Version

More information

Chapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Chapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Chapter 20 Planning Accelerated Life Tests William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors

More information

LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary

LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary ADH 1/7/014 LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary This experiment will introduce you to the kinetic properties of low-pressure gases. You will make observations on the

More information

Influence of the size of a nation s population on performances in athletics

Influence of the size of a nation s population on performances in athletics Influence of the size of a nation s population on performances in athletics N.P. Linthorne* *Centre for Sports Medicine and Human Performance, School of Sport and Education, Brunel University, Uxbridge,

More information

GLMM standardisation of the commercial abalone CPUE for Zones A-D over the period

GLMM standardisation of the commercial abalone CPUE for Zones A-D over the period GLMM standardisation of the commercial abalone for Zones A-D over the period 1980 2015 Anabela Brandão and Doug S. Butterworth Marine Resource Assessment & Management Group (MARAM) Department of Mathematics

More information

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings Objectives The goal of this study is to advance the state of the art in understanding

More information

Calculation of Trail Usage from Counter Data

Calculation of Trail Usage from Counter Data 1. Introduction 1 Calculation of Trail Usage from Counter Data 1/17/17 Stephen Martin, Ph.D. Automatic counters are used on trails to measure how many people are using the trail. A fundamental question

More information

OPERATIONAL AMV PRODUCTS DERIVED WITH METEOSAT-6 RAPID SCAN DATA. Arthur de Smet. EUMETSAT, Am Kavalleriesand 31, D Darmstadt, Germany ABSTRACT

OPERATIONAL AMV PRODUCTS DERIVED WITH METEOSAT-6 RAPID SCAN DATA. Arthur de Smet. EUMETSAT, Am Kavalleriesand 31, D Darmstadt, Germany ABSTRACT OPERATIONAL AMV PRODUCTS DERIVED WITH METEOSAT-6 RAPID SCAN DATA Arthur de Smet EUMETSAT, Am Kavalleriesand 31, D-64295 Darmstadt, Germany ABSTRACT EUMETSAT started its Rapid Scanning Service on September

More information

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match

Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match Honest Mirror: Quantitative Assessment of Player Performances in an ODI Cricket Match Madan Gopal Jhawar 1 and Vikram Pudi 2 1 Microsoft, India majhawar@microsoft.com 2 IIIT Hyderabad, India vikram@iiit.ac.in

More information

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes Finding your feet: modelling the batting abilities of cricketers using Gaussian processes Oliver Stevenson & Brendon Brewer PhD candidate, Department of Statistics, University of Auckland o.stevenson@auckland.ac.nz

More information

A Combined Recruitment Index for Demersal Juvenile Cod in NAFO Divisions 3K and 3L

A Combined Recruitment Index for Demersal Juvenile Cod in NAFO Divisions 3K and 3L NAFO Sci. Coun. Studies, 29: 23 29 A Combined Recruitment Index for Demersal Juvenile Cod in NAFO Divisions 3K and 3L David C. Schneider Ocean Sciences Centre, Memorial University St. John's, Newfoundland,

More information

At each type of conflict location, the risk is affected by certain parameters:

At each type of conflict location, the risk is affected by certain parameters: TN001 April 2016 The separated cycleway options tool (SCOT) was developed to partially address some of the gaps identified in Stage 1 of the Cycling Network Guidance project relating to separated cycleways.

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 19 January 2011 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 19 January 2011 1 / 32 Examples of Pattern Recognition in

More information

Trouble With The Curve: Improving MLB Pitch Classification

Trouble With The Curve: Improving MLB Pitch Classification Trouble With The Curve: Improving MLB Pitch Classification Michael A. Pane Samuel L. Ventura Rebecca C. Steorts A.C. Thomas arxiv:134.1756v1 [stat.ap] 5 Apr 213 April 8, 213 Abstract The PITCHf/x database

More information

UNDERSTANDING A DIVE COMPUTER. by S. Angelini, Ph.D. Mares S.p.A.

UNDERSTANDING A DIVE COMPUTER. by S. Angelini, Ph.D. Mares S.p.A. UNDERSTANDING A DIVE COMPUTER by S. Angelini, Ph.D. Mares S.p.A. Dive Computer UNDERSTANDING A DIVE COMPUTER The decompression algorithm in a dive computer is an attempt to replicate the effects of a dive

More information

Extreme Shooters in the NBA

Extreme Shooters in the NBA Extreme Shooters in the NBA Manav Kant and Lisa R. Goldberg 1. Introduction December 24, 2018 It is widely perceived that star shooters in basketball have hot and cold streaks. This perception was first

More information

Chapter 5: Methods and Philosophy of Statistical Process Control

Chapter 5: Methods and Philosophy of Statistical Process Control Chapter 5: Methods and Philosophy of Statistical Process Control Learning Outcomes After careful study of this chapter You should be able to: Understand chance and assignable causes of variation, Explain

More information

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n. University of Groningen Decomposition Methods in Demography Canudas Romo, Vladimir IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please

More information

EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming

EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming Jinkyoo Park 1 Introduction In a wind farm, the wakes formed by upstream wind turbines decrease the power outputs of downstream

More information

ABSTRACT AUTHOR. Kinematic Analysis of the Women's 400m Hurdles. by Kenny Guex. he women's 400m hurdles is a relatively

ABSTRACT AUTHOR. Kinematic Analysis of the Women's 400m Hurdles. by Kenny Guex. he women's 400m hurdles is a relatively Study Kinematic Analysis of the Women's 400m Hurdles by IAAF 27:1/2; 41-51, 2012 by Kenny Guex ABSTRACT The women's 400m hurdles is a relatively new discipline and a complex event that cannot be approached

More information

5.1 Introduction. Learning Objectives

5.1 Introduction. Learning Objectives Learning Objectives 5.1 Introduction Statistical Process Control (SPC): SPC is a powerful collection of problem-solving tools useful in achieving process stability and improving capability through the

More information

Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037)

Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037) NCEA Level 1 Mathematics and Statistics (91037) 2016 page 1 of 8 Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037) Evidence Statement One Expected

More information

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Analysis of Variance. Copyright 2014 Pearson Education, Inc. Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually

More information

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T GAO, Zheng 26 May 2016 Abstract The data analysis is two-part: an exploratory data analysis, and an attempt to answer an inferential

More information

EVALUATION OF ENVISAT ASAR WAVE MODE RETRIEVAL ALGORITHMS FOR SEA-STATE FORECASTING AND WAVE CLIMATE ASSESSMENT

EVALUATION OF ENVISAT ASAR WAVE MODE RETRIEVAL ALGORITHMS FOR SEA-STATE FORECASTING AND WAVE CLIMATE ASSESSMENT EVALUATION OF ENVISAT ASAR WAVE MODE RETRIEVAL ALGORITHMS FOR SEA-STATE FORECASTING AND WAVE CLIMATE ASSESSMENT F.J. Melger ARGOSS, P.O. Box 61, 8335 ZH Vollenhove, the Netherlands, Email: info@argoss.nl

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning

More information

The Incremental Evolution of Gaits for Hexapod Robots

The Incremental Evolution of Gaits for Hexapod Robots The Incremental Evolution of Gaits for Hexapod Robots Abstract Gait control programs for hexapod robots are learned by incremental evolution. The first increment is used to learn the activations required

More information

The Scrum Guide. The Definitive Guide to Scrum: The Rules of the Game. October Developed and sustained by Ken Schwaber and Jeff Sutherland

The Scrum Guide. The Definitive Guide to Scrum: The Rules of the Game. October Developed and sustained by Ken Schwaber and Jeff Sutherland The Scrum Guide The Definitive Guide to Scrum: The Rules of the Game October 2011 Developed and sustained by Ken Schwaber and Jeff Sutherland Table of Contents Purpose of the Scrum Guide... 3 Scrum Overview...

More information

Tutorial for the. Total Vertical Uncertainty Analysis Tool in NaviModel3

Tutorial for the. Total Vertical Uncertainty Analysis Tool in NaviModel3 Tutorial for the Total Vertical Uncertainty Analysis Tool in NaviModel3 May, 2011 1. Introduction The Total Vertical Uncertainty Analysis Tool in NaviModel3 has been designed to facilitate a determination

More information

Spatial Methods for Road Course Measurement

Spatial Methods for Road Course Measurement Page 1 of 10 CurtinSearch Curtin Site Index Contact Details Links LASCAN Spatial Sciences WA Centre for Geodesy COURSE MEASUREMENT This page is a summary of results of some of the research we have recently

More information

Traffic safety developments in Poland

Traffic safety developments in Poland Traffic safety developments in Poland Siem Oppe D-2001-8 Traffic safety developments in Poland A research note D-2001-8 Siem Oppe Leidschendam, 2001 SWOV Institute for Road Safety Research, The Netherlands

More information

A REVIEW OF AGE ADJUSTMENT FOR MASTERS SWIMMERS

A REVIEW OF AGE ADJUSTMENT FOR MASTERS SWIMMERS A REVIEW OF ADJUSTMENT FOR MASTERS SWIMMERS Written by Alan Rowson Copyright 2013 Alan Rowson Last Saved on 28-Apr-13 page 1 of 10 INTRODUCTION In late 2011 and early 2012, in conjunction with Anthony

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Kromrey & Rendina-Gobioff An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables Jeffrey D. Kromrey Gianna Rendina-Gobioff University of South Florida The Type I error

More information

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

Available online at ScienceDirect. Procedia Engineering 112 (2015 )

Available online at  ScienceDirect. Procedia Engineering 112 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 112 (2015 ) 540 545 7th Asia-Pacific Congress on Sports Technology, APCST 2015 Movement variability of professional pool billiards

More information

Using Markov Chains to Analyze a Volleyball Rally

Using Markov Chains to Analyze a Volleyball Rally 1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using

More information

Modelling and Simulation of Environmental Disturbances

Modelling and Simulation of Environmental Disturbances Modelling and Simulation of Environmental Disturbances (Module 5) Dr Tristan Perez Centre for Complex Dynamic Systems and Control (CDSC) Prof. Thor I Fossen Department of Engineering Cybernetics 18/09/2007

More information

Application of Bayesian Networks to Shopping Assistance

Application of Bayesian Networks to Shopping Assistance Application of Bayesian Networks to Shopping Assistance Yang Xiang, Chenwen Ye, and Deborah Ann Stacey University of Guelph, CANADA Abstract. We develop an on-line shopping assistant that can help a e-shopper

More information

Assessment Summary Report Gulf of Mexico Red Snapper SEDAR 7

Assessment Summary Report Gulf of Mexico Red Snapper SEDAR 7 Assessment Summary Report Gulf of Mexico Red Snapper SEDAR 7 Stock Distribution: Red snapper are found throughout the Gulf of Mexico, the Caribbean Sea, and from the U.S. Atlantic Coast to northern South

More information

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007. Acknowledgement In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007. There are many others without whom I could not have completed

More information

OPTIMIZATION OF A WAVE CANCELLATION MULTIHULL SHIP USING CFD TOOLS

OPTIMIZATION OF A WAVE CANCELLATION MULTIHULL SHIP USING CFD TOOLS OPTIMIZATION OF A AVE CANCELLATION MULTIHULL SHIP USING CFD TOOLS C. Yang, R. Löhner and O. Soto School of Computational Sciences, George Mason University Fairfax VA 030-4444, USA ABSTRACT A simple CFD

More information

Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability

Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability CHAPTER 92 Tokyo: Simulating Hyperpath-Based Vehicle Navigations and its Impact on Travel Time Reliability Daisuke Fukuda, Jiangshan Ma, Kaoru Yamada and Norihito Shinkai 92.1 Introduction Most standard

More information

Evaluating NBA Shooting Ability using Shot Location

Evaluating NBA Shooting Ability using Shot Location Evaluating NBA Shooting Ability using Shot Location Dennis Lock December 16, 2013 There are many statistics that evaluate the performance of NBA players, including some that attempt to measure a players

More information

Walk - Run Activity --An S and P Wave Travel Time Simulation ( S minus P Earthquake Location Method)

Walk - Run Activity --An S and P Wave Travel Time Simulation ( S minus P Earthquake Location Method) Walk - Run Activity --An S and P Wave Travel Time Simulation ( S minus P Earthquake Location Method) L. W. Braile and S. J. Braile (June, 2000) braile@purdue.edu http://web.ics.purdue.edu/~braile Walk

More information

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS #19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS Final Research Report Luis E. Navarro-Serment, Ph.D. The Robotics Institute Carnegie Mellon University November 25, 2018. Disclaimer

More information

Richard S. Marken The RAND Corporation

Richard S. Marken The RAND Corporation Fielder s Choice: A Unified Theory of Catching Fly Balls Richard S. Marken The RAND Corporation February 11, 2002 Mailing Address: 1700 Main Street Santa Monica, CA 90407-2138 Phone: 310 393-0411 ext 7971

More information

4.1 Why is the Equilibrium Diameter Important?

4.1 Why is the Equilibrium Diameter Important? Chapter 4 Equilibrium Calculations 4.1 Why is the Equilibrium Diameter Important? The model described in Chapter 2 provides information on the thermodynamic state of the system and the droplet size distribution.

More information

Two Machine Learning Approaches to Understand the NBA Data

Two Machine Learning Approaches to Understand the NBA Data Two Machine Learning Approaches to Understand the NBA Data Panagiotis Lolas December 14, 2017 1 Introduction In this project, I consider applications of machine learning in the analysis of nba data. To

More information

Determining bicycle infrastructure preferences A case study of Dublin

Determining bicycle infrastructure preferences A case study of Dublin *Manuscript Click here to view linked References 1 Determining bicycle infrastructure preferences A case study of Dublin Brian Caulfield 1, Elaine Brick 2, Orla Thérèse McCarthy 1 1 Department of Civil,

More information

1999 On-Board Sacramento Regional Transit District Survey

1999 On-Board Sacramento Regional Transit District Survey SACOG-00-009 1999 On-Board Sacramento Regional Transit District Survey June 2000 Sacramento Area Council of Governments 1999 On-Board Sacramento Regional Transit District Survey June 2000 Table of Contents

More information

WHAT CAN WE LEARN FROM COMPETITION ANALYSIS AT THE 1999 PAN PACIFIC SWIMMING CHAMPIONSHIPS?

WHAT CAN WE LEARN FROM COMPETITION ANALYSIS AT THE 1999 PAN PACIFIC SWIMMING CHAMPIONSHIPS? WHAT CAN WE LEARN FROM COMPETITION ANALYSIS AT THE 1999 PAN PACIFIC SWIMMING CHAMPIONSHIPS? Bruce Mason and Jodi Cossor Biomechanics Department, Australian Institute of Sport, Canberra, Australia An analysis

More information

Naval Postgraduate School, Operational Oceanography and Meteorology. Since inputs from UDAS are continuously used in projects at the Naval

Naval Postgraduate School, Operational Oceanography and Meteorology. Since inputs from UDAS are continuously used in projects at the Naval How Accurate are UDAS True Winds? Charles L Williams, LT USN September 5, 2006 Naval Postgraduate School, Operational Oceanography and Meteorology Abstract Since inputs from UDAS are continuously used

More information

Lab 4: Root Locus Based Control Design

Lab 4: Root Locus Based Control Design Lab 4: Root Locus Based Control Design References: Franklin, Powell and Emami-Naeini. Feedback Control of Dynamic Systems, 3 rd ed. Addison-Wesley, Massachusetts: 1994. Ogata, Katsuhiko. Modern Control

More information

A Nomogram Of Performances In Endurance Running Based On Logarithmic Model Of Péronnet-Thibault

A Nomogram Of Performances In Endurance Running Based On Logarithmic Model Of Péronnet-Thibault American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-6, Issue-9, pp-78-85 www.ajer.org Research Paper Open Access A Nomogram Of Performances In Endurance Running

More information

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND TOPIC 0: BASIC PROBABILITY AND THE HOT HAND The Hot Hand Debate Let s start with a basic question, much debated in sports circles: Does the Hot Hand really exist? A number of studies on this topic can

More information

Business and housing market cycles in the euro area: a multivariate unobserved component approach

Business and housing market cycles in the euro area: a multivariate unobserved component approach Business and housing market cycles in the euro area: a multivariate unobserved component approach Laurent Ferrara (a) and Siem Jan Koopman (b) http://staff.feweb.vu.nl/koopman (a) Banque de France (b)

More information

ENGINEERING ANALYSIS OF THOROUGHBRED RACING. Rubin Boxer. Revelation Software

ENGINEERING ANALYSIS OF THOROUGHBRED RACING. Rubin Boxer. Revelation Software ENGINEERING ANALYSIS OF THOROUGHBRED RACING by Rubin Boxer Revelation Software P.O. Box 21344 Santa Barbara, CA 93121-1344 Phone: (805) 962-9045 www.revelationprofits.com Copyright 1997-2004 Revelation

More information

Probabilistic Modelling in Multi-Competitor Games. Christopher J. Farmer

Probabilistic Modelling in Multi-Competitor Games. Christopher J. Farmer Probabilistic Modelling in Multi-Competitor Games Christopher J. Farmer Master of Science Division of Informatics University of Edinburgh 2003 Abstract Previous studies in to predictive modelling of sports

More information

Journal of Quantitative Analysis in Sports

Journal of Quantitative Analysis in Sports Journal of Quantitative Analysis in Sports Volume 1, Issue 1 2005 Article 5 Determinants of Success in the Olympic Decathlon: Some Statistical Evidence Ian Christopher Kenny Dan Sprevak Craig Sharp Colin

More information

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for : City, State Prepared for City Convention & Visitors Bureau Table of Contents City Convention & Visitors Bureau... 1 Executive Summary... 3 Introduction... 4 Approach and Methodology... 4 General Characteristics

More information

Women and Marathons: A Low Participation. Recreation Research Proposal. PRM 447 Research and Evaluation in PRM. Jaimie Coastman.

Women and Marathons: A Low Participation. Recreation Research Proposal. PRM 447 Research and Evaluation in PRM. Jaimie Coastman. Running WOMEN head: AND WOMEN MARATHONS: AND MARATHONS: A LOW PARTICIPATION A LOW 1 PARTICIPATION Women and Marathons: A Low Participation Recreation Research Proposal PRM 447 Research and Evaluation in

More information

Optimal Weather Routing Using Ensemble Weather Forecasts

Optimal Weather Routing Using Ensemble Weather Forecasts Optimal Weather Routing Using Ensemble Weather Forecasts Asher Treby Department of Engineering Science University of Auckland New Zealand Abstract In the United States and the United Kingdom it is commonplace

More information

A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems

A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems The First International Workshop on Cyber-Physical Networking Systems A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems ChrisY.T.Ma Advanced Digital Sciences Center Illinois at Singapore

More information

The final set in a tennis match: four years at Wimbledon 1

The final set in a tennis match: four years at Wimbledon 1 Published as: Journal of Applied Statistics, Vol. 26, No. 4, 1999, 461-468. The final set in a tennis match: four years at Wimbledon 1 Jan R. Magnus, CentER, Tilburg University, the Netherlands and Franc

More information

DEPARTMENT OF THE NAVY DIVISION NEWPORT OFFICE OF COUNSEL PHONE: FAX: DSN:

DEPARTMENT OF THE NAVY DIVISION NEWPORT OFFICE OF COUNSEL PHONE: FAX: DSN: IMAVSBA WARFARE CENTERS NEWPORT DEPARTMENT OF THE NAVY NAVAL UNDERSEA WARFARE CENTER DIVISION NEWPORT OFFICE OF COUNSEL PHONE: 401 832-3653 FAX: 401 832-4432 DSN: 432-3653 Attorney Docket No. 85031 Date:

More information

MANITOBA'S ABORIGINAL COMMUNITY: A 2001 TO 2026 POPULATION & DEMOGRAPHIC PROFILE

MANITOBA'S ABORIGINAL COMMUNITY: A 2001 TO 2026 POPULATION & DEMOGRAPHIC PROFILE MANITOBA'S ABORIGINAL COMMUNITY: A 2001 TO 2026 POPULATION & DEMOGRAPHIC PROFILE MBS 2005-4 JULY 2005 TABLE OF CONTENTS I. Executive Summary 3 II. Introduction.. 9 PAGE III. IV. Projected Aboriginal Identity

More information

Effects of directionality on wind load and response predictions

Effects of directionality on wind load and response predictions Effects of directionality on wind load and response predictions Seifu A. Bekele 1), John D. Holmes 2) 1) Global Wind Technology Services, 205B, 434 St Kilda Road, Melbourne, Victoria 3004, Australia, seifu@gwts.com.au

More information

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher

ROSE-HULMAN INSTITUTE OF TECHNOLOGY Department of Mechanical Engineering. Mini-project 3 Tennis ball launcher Mini-project 3 Tennis ball launcher Mini-Project 3 requires you to use MATLAB to model the trajectory of a tennis ball being shot from a tennis ball launcher to a player. The tennis ball trajectory model

More information

PUV Wave Directional Spectra How PUV Wave Analysis Works

PUV Wave Directional Spectra How PUV Wave Analysis Works PUV Wave Directional Spectra How PUV Wave Analysis Works Introduction The PUV method works by comparing velocity and pressure time series. Figure 1 shows that pressure and velocity (in the direction of

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 6 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Outline 2 Review of lecture 5 The

More information

Deciding When to Quit: Reference-Dependence over Slot Machine Outcomes

Deciding When to Quit: Reference-Dependence over Slot Machine Outcomes Deciding When to Quit: Reference-Dependence over Slot Machine Outcomes By JAIMIE W. LIEN * AND JIE ZHENG * Lien: Department of Economics, School of Economics and Management, Tsinghua University, Beijing,

More information

NBA TEAM SYNERGY RESEARCH REPORT 1

NBA TEAM SYNERGY RESEARCH REPORT 1 NBA TEAM SYNERGY RESEARCH REPORT 1 NBA Team Synergy and Style of Play Analysis Karrie Lopshire, Michael Avendano, Amy Lee Wang University of California Los Angeles June 3, 2016 NBA TEAM SYNERGY RESEARCH

More information

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2

GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2 GENETICS OF RACING PERFORMANCE IN THE AMERICAN QUARTER HORSE: II. ADJUSTMENT FACTORS AND CONTEMPORARY GROUPS 1'2 S. T. Buttram 3, R. L. Willham 4 and D. E. Wilson 4 Iowa State University, Ames 50011 ABSTRACT

More information

A Comparison of Trend Scoring and IRT Linking in Mixed-Format Test Equating

A Comparison of Trend Scoring and IRT Linking in Mixed-Format Test Equating A Comparison of Trend Scoring and IRT Linking in Mixed-Format Test Equating Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA Hua Wei Lihua Yao,

More information

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Unit 5 Statistical Reasoning 1 5.1 Exploring Data Goals: Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion Data: A set of values. A set of data can

More information

The risk assessment of ships manoeuvring on the waterways based on generalised simulation data

The risk assessment of ships manoeuvring on the waterways based on generalised simulation data Safety and Security Engineering II 411 The risk assessment of ships manoeuvring on the waterways based on generalised simulation data L. Gucma Maritime University of Szczecin, Poland Abstract This paper

More information

Estimating a Toronto Pedestrian Route Choice Model using Smartphone GPS Data. Gregory Lue

Estimating a Toronto Pedestrian Route Choice Model using Smartphone GPS Data. Gregory Lue Estimating a Toronto Pedestrian Route Choice Model using Smartphone GPS Data Gregory Lue Presentation Outline Introduction Background Data Smartphone Data Alternative Route Generation Choice Model Toronto

More information