A definition of depth for functional observations

Similar documents
Today s plan: Section 4.2: Normal Distribution

Business and housing market cycles in the euro area: a multivariate unobserved component approach

North Point - Advance Placement Statistics Summer Assignment

Chapter 12 Practice Test

Robust specification testing in regression: the FRESET test and autocorrelated disturbances

STANDARD SCORES AND THE NORMAL DISTRIBUTION

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept

Section I: Multiple Choice Select the best answer for each problem.

1. The data below gives the eye colors of 20 students in a Statistics class. Make a frequency table for the data.

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Results and Discussion for Steady Measurements

Lecture 1: Knot Theory

Determination of the Design Load for Structural Safety Assessment against Gas Explosion in Offshore Topside

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Chapter 2: Modeling Distributions of Data

Confidence Interval Notes Calculating Confidence Intervals

STAT 155 Introductory Statistics. Lecture 2-2: Displaying Distributions with Graphs

Math 4. Unit 1: Conic Sections Lesson 1.1: What Is a Conic Section?

Biomechanics and Models of Locomotion

Variable Face Milling to Normalize Putter Ball Speed and Maximize Forgiveness

CHAPTER 2 Modeling Distributions of Data

Bivariate Data. Frequency Table Line Plot Box and Whisker Plot

The Intrinsic Value of a Batted Ball Technical Details

An Empirical Comparison of Regression Analysis Strategies with Discrete Ordinal Variables

Running head: DATA ANALYSIS AND INTERPRETATION 1

Chapter 20. Planning Accelerated Life Tests. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Bridge Decomposition of Restriction Measures Tom Alberts joint work with Hugo Duminil (ENS) with lots of help from Wendelin Werner University of

2 When Some or All Labels are Missing: The EM Algorithm

A Nomogram Of Performances In Endurance Running Based On Logarithmic Model Of Péronnet-Thibault

The purpose of this experiment is to find this acceleration for a puck moving on an inclined air table.

Incompressible Flow over Airfoils

Journal of Quantitative Analysis in Sports Manuscript 1039

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

REPLACING REDUNDANT STABILOMETRY PARAMETERS WITH RATIO AND MAXIMUM DEVIATION PARAMETERS

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1

GaitAnalysisofEightLegedRobot

Gait Analyser. Description of Walking Performance

Driver Behavior at Highway-Rail Grade Crossings With Passive Traffic Controls

On the convergence of fitting algorithms in computer vision

5.1 Introduction. Learning Objectives

USING A CALCULATOR TO INVESTIGATE WHETHER A LINEAR, QUADRATIC OR EXPONENTIAL FUNCTION BEST FITS A SET OF BIVARIATE NUMERICAL DATA

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

Exploring Measures of Central Tendency (mean, median and mode) Exploring range as a measure of dispersion

PREDICTING the outcomes of sporting events

The risk assessment of ships manoeuvring on the waterways based on generalised simulation data

Using sensory feedback to improve locomotion performance of the salamander robot in different environments

Motion Control of a Bipedal Walking Robot

MATH 114 QUANTITATIVE REASONING PRACTICE TEST 2

Besides the reported poor performance of the candidates there were a number of mistakes observed on the assessment tool itself outlined as follows:

Understanding Winter Road Conditions in Yellowstone National Park Using Cumulative Sum Control Charts

Policy Management: How data and information impacts the ability to make policy decisions:

Legendre et al Appendices and Supplements, p. 1

Name May 3, 2007 Math Probability and Statistics

High Frequency Acoustical Propagation and Scattering in Coastal Waters

A SURVEY OF 1997 COLORADO ANGLERS AND THEIR WILLINGNESS TO PAY INCREASED LICENSE FEES

Gait Analysis of Wittenberg s Women s Basketball Team: The Relationship between Shoulder Movement and Injuries

Calculation of Trail Usage from Counter Data

A Finite Model Theorem for the Propositional µ-calculus

KARIYA Takeaki, Faculty Fellow, RIETI

On the Impact of Garbage Collection on flash-based SSD endurance

Basketball data science

The Credible Diversity Plot: A Graphic for the Comparison of Biodiversity

SCREENING OF TOPOGRAPHIC FACTOR ON WIND SPEED ESTIMATION WITH NEURAL NETWORK ANALYSIS

Rolling Out Measures of Non-Motorized Accessibility: What Can We Now Say? Kevin J. Krizek University of Colorado

Knots and their projections I

A Hare-Lynx Simulation Model

UAB MATH-BY-MAIL CONTEST, 2004

Hitting with Runners in Scoring Position

AP 11.1 Notes WEB.notebook March 25, 2014

Evaluating NBA Shooting Ability using Shot Location

Academic Grant CPR process monitors provided by Zoll. conflict of interest to declare

Special Topics: Data Science

Human Pose Tracking III: Dynamics. David Fleet University of Toronto

9.3 Histograms and Box Plots

2017 AMC 12B. 2. Real numbers,, and satisfy the inequalities,, and. Which of the following numbers is necessarily positive?

The Academy of Model Aeronautics ALPHA: Potential Energy Background Information for the Teacher

Equine Results Interpretation Guide For Cannon Angles May 2013

PSY201: Chapter 5: The Normal Curve and Standard Scores

Descriptive Stats. Review

Optimizing Cyclist Parking in a Closed System

Supplementary Figure S1

The pth percentile of a distribution is the value with p percent of the observations less than it.

13. TIDES Tidal waters

Home Team Advantage in English Premier League

Estimating a Toronto Pedestrian Route Choice Model using Smartphone GPS Data. Gregory Lue

Journal of Human Sport and Exercise E-ISSN: Universidad de Alicante España

Sensing and Modeling of Terrain Features using Crawling Robots

Minimal influence of wind and tidal height on underwater noise in Haro Strait

Bézier Curves and Splines

The Willingness to Walk of Urban Transportation Passengers (A Case Study of Urban Transportation Passengers in Yogyakarta Indonesia)

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

DUE TO EXTERNAL FORCES

A MODIFIED DYNAMIC MODEL OF THE HUMAN LOWER LIMB DURING COMPLETE GAIT CYCLE

Taking Your Class for a Walk, Randomly

A Fair Target Score Calculation Method for Reduced-Over One day and T20 International Cricket Matches

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

1wsSMAM 319 Some Examples of Graphical Display of Data

1. In a hypothesis test involving two-samples, the hypothesized difference in means must be 0. True. False

Entropy-based benchmarking methods

A Cost Effective and Efficient Way to Assess Trail Conditions: A New Sampling Approach

Transcription:

A definition of depth for functional observations Sara López-Pintado and Juan Romo Departamento de Estadística y Econometría Universidad Carlos III de Madrid Madrid, Spain A definition of depth for functional observations 1

1. MOTIVATION AND BACKGROUND 2. A NEW CONCEPT OF DEPTH FOR FUNCTIONAL DATA 3. FINITE-DIMENSIONAL VERSION 4. SOME PROPERTIES 5. APPLICATIONS 6. CONCLUSIONS A definition of depth for functional observations 2

1. MOTIVATION AND BACKGROUND 70 60 50 40 30 20 10 0 10 20 0 2 4 6 8 10 12 14 16 18 20 Question: which one is the deepest function? A definition of depth for functional observations 3

The observations x 1 (t), x 2 (t),..., x n (t) are n functions defined on an interval I. A definition of depth for functional observations 4

Why considering functional data? 1. In many areas of knowledge the process generating the data provides us in a natural way with a set of functions. 2. Many problems are better approached if the observations are treated as continuous functions. 3. Each curve from the sample can be observed at different points and the separation of these points can be irregular. 4. Technological advance with the development of progressively more precise and sophisticated equipment makes possible the acquisition of a large number of data, usually called high frequency data, that allow us to express the data as functions. A definition of depth for functional observations 5

Goal: to introduce a definition of depth for functional data. This concept will be used to measure the centrality of a curve with respect to a set of curves. E.g.: to define the deepest function. The functional depth provides a center-outward ordering of a sample of curves. Order statistics will be defined. (L statistics). The idea of deepest point of a set of data allows to classify a new observation by using the distance to a class deepest point. A definition of depth for functional observations 6

The notion of depth has been extensively studied in the multivariate context. Some definitions of data depth are: 1. The Mahalanobis depth (Mahalanobis, 1936). 2. The half-space depth (Hodges, 1955, Tukey, 1975). 3. The Oja depth (Oja, 1983). 4. The simplicial depth (Liu, 1990). 5. The majority depth (Singh, 1991). 6. The projection depth (Zuo, 2003). A definition of depth for functional observations 7

Liu (1990), and Zuo and Serfling (2000) introduce general conditions to define a notion of statistical depth. Key properties a concept of depth should verify: Affine invariance Maximality at center Monotonicity relative to deepest point Vanishing at infinity A definition of depth for functional observations 8

Fraiman and Muniz (2001) defined a concept of depth for functional data. Let X 1 (t),..., X n (t) be i.i.d. stochastic processes defined on [0,1]. Let F t be the univariate marginal distribution of X 1 (t). Let D n be any concept of depth in R. Consider for every t [0, 1] D n (X i (t)) = Z i (t), (univariate depth of X i (t) at t with respect to X 1 (t),..., X n (t)). Defining I i = 1 0 Z i (t)dt, 1 i n, the set of functions X 1 (t),..., X n (t) can be ordered according to the value of I i. A definition of depth for functional observations 9

2. A NEW CONCEPT OF DEPTH FOR FUNCTIONAL DATA Let x 1 (t),..., x n (t) be a sample of functions. Define V (x i1,..., x ik ) = { } x: min {x i r (t)} x(t) max {x i r (t)}, t [0, 1] r=1,...,k r=1,...,k (Functions whose graphs belong to the area delimited by the graphs of x i1,x i2,..., x ik ). Equivalently, V = { x(t) = α t min {x i r (t)} + (1 α t ) r=1,...,k } max {x i r (t)}, t [0, 1], α t [0, 1] r=1,...,k A definition of depth for functional observations 10

30 20 10 V(x1,x2) 0 10 x x2 20 x1 30 40 0 50 100 150 200 250 300 350 400 A definition of depth for functional observations 11

60 50 40 30 20 10 0 10 0 2 4 6 8 10 12 14 16 18 20 A definition of depth for functional observations 12

The J-depth for x is: S n,j (x) = J j=2 S j) n (x), where S j) n (x) = 1 i 1 <i 2 <...i j n I(x V (x i1,x i2,..., x ij )) ( n j) are proportions of bands containing x; this gives a center-outward ordering of the sample of curves. A deepest function µ n,j will satisfy: µ n,j = arg max x {x 1,...,x n } S n,j (x) A definition of depth for functional observations 13

The population version is S J (x) = J j=2 S j) (x) = J j=2 P (x V (x 1, x 2,..., x j )), and a population deepest function is a function µ J maximizing S J ( ). A definition of depth for functional observations 14

Example: Trimmed mean for functional data The functional version of the α trimmed mean will be the average of the n [nα] deepest observations: where 1 n ( n i=1 m α n,j = n I [β,+ ] (S n,j (x i ))x i n, β > 0, I [β,+ ] (S n,j (x i )) i=1 i=1 ) I [β,+ ] (S n,j (x i )) 1 α. A definition of depth for functional observations 15

3. FINITE-DIMENSIONAL VERSION Let F be a probability distribution in R d ; d 1. Let {y 1,..., y n } be a random sample from F. A multivariate observation can be seen as a function defined on {1, 2,..., d} : y(l) is the l th component of the vector y. A definition of depth for functional observations 16

For d = 2, the finite dimensional band V (y 1, y 2,..., y j ) is the interval in the plane determined by the following four vertices ( ) min {y k(1)}, min {y k(2)}, k=1,...,j k=1,...j ( ) max {y k(1)}, max {y k(2)}, k=1,...,j k=1,...,j ( ) min {y k(1)}, max {y k(2)} k=1,...,j k=1,...,j ( ) max {y k(1)}, min {y k(2)} k=1,...,j k=1,...,j A definition of depth for functional observations 17

15 10 5 0 0 2 4 6 8 10 12 S j) n (y) is the proportion of intervals V (y i1, y i2,..., y ij ) determined by j sample points containing y. A definition of depth for functional observations 18

Example: deepest points for S n,2 ( ) 15 10 5 0 0 2 4 6 8 10 12 Remark: They essentially coincide with Liu s simplicial deepest points. A definition of depth for functional observations 19

Another example: deepest points for S n,3 ( ) 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 A definition of depth for functional observations 20

How does the choice of J affect the depth? 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 J = 2 A definition of depth for functional observations 21

2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 J = 3 A definition of depth for functional observations 22

2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 J = 4 A definition of depth for functional observations 23

4. SOME PROPERTIES Finite-dimensional data: 1. The deepest point in R (with S J ) coincides with the usual univariate median; moreover, the order induced by S J is independent of J. 2. S J ( ) is invariant under transformations of type T (y) = A y + b, where A is a diagonal and invertible d d matrix and b R d : S J,T (T y) = S J (y) 3. If F is absolutely continuous and symmetric then S J (αy) is a monotone nonincreasing function in α 0 for all y R d. A definition of depth for functional observations 24

4. S J ( ) vanishes at infinity: sup y M S J (y) 0 if M 5. If the marginal distributions of F are absolutely continuous then S J ( ) is continuous. 6. S n,j ( ) is strongly consistent: S n,j (y) a.s. S J (y) when n, y R d A definition of depth for functional observations 25

7. S n,j (y) is uniformly consistent: sup S n,j (y) S J (y) 0 a.s. y R d as n 8. If S J ( ) is uniquely maximized at µ, and µ n is a sequence of random variables satisfying S n,j (µ n ) = sup x R d S n,j (x), then µ n µ a.s. as n A definition of depth for functional observations 26

Functional data: 1. Let Q [0, 1] = {q 1, q 2,..., q n,...} and x n = (x(q 1 ),..., x(q n )). Then: S J (x n ) S J (x), when n 2. S J ( ) is invariant under transformations of type T (x) = a(t) x(t) + b(t) : S J,T (T x) = S J (x) 3. sup x M S J (x) 0 if M A definition of depth for functional observations 27

4. S J ( ) is continuous. 5. S n,j (x) is a consistent estimator: for any x, S n,j (x) a.s. S J (x) when n A definition of depth for functional observations 28

5. APPLICATIONS 70 60 50 40 30 20 10 0 10 20 0 2 4 6 8 10 12 14 16 18 20 Angles in the sagittal plane formed by the hip as 39 children go through a gait cycle. (Ramsay and Silverman, 1997) A definition of depth for functional observations 29

50 40 30 20 10 0 10 0 2 4 6 8 10 12 14 16 18 20 Six deepest curves (J = 5). The index S n,j when J increases gives the same centered-outward order. A definition of depth for functional observations 30

70 60 50 40 30 20 10 0 10 20 0 2 4 6 8 10 12 14 16 18 20 Three deepest curves represented with colours red, green and yellow. The red curve is the median function. A definition of depth for functional observations 31

70 60 50 40 30 20 10 0 10 20 0 2 4 6 8 10 12 14 16 18 20 The ten less deepest curves are in red. A definition of depth for functional observations 32

90 80 70 60 50 40 30 20 10 0 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Angles in the sagittal plane formed by the knee as 39 children go through a gait cycle. The curve in red is the deepest one. A definition of depth for functional observations 33

30 20 10 0 10 20 30 40 0 50 100 150 200 250 300 350 400 Daily temperature in different weather stations in Canada during one year. The raw data were smoothed considering a Fourier basis with 65 elements in the basis. A definition of depth for functional observations 34

30 20 10 0 10 20 30 40 0 50 100 150 200 250 300 350 400 Five deepest curves with, S n,j, J = 3. A definition of depth for functional observations 35

6. CONCLUSIONS A new definition of depth for functional observations is introduced. This concept of depth can be particularized to the finite-dimensional case and is an alternative definition of depth for multivariate data. It verifies essentially the properties established by Liu (1990) and Zuo and Serfling (2000). It is convenient for high-dimensional data because regardless of the dimension of the data, low values of J can be considered. A definition of depth for functional observations 36

REFERENCES Arcones, M.A. and Giné, E. (1993). Limit theorems for U-processes.. Ann. Probab. 21, 1494-1542. Arcones, M.A., Chen, Z. and Giné, E. (1994). Estimators related to U-processes with applications to multivariate medians: asymptotic normality.. Ann. Probab. 21, 1494-1542. Fraiman, R. and Muniz, G. (2001). Trimmed mean for functional data. Test. 10, 419-440. Liu, R. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18, 405-414. A definition of depth for functional observations 37

Liu, R. (1995). Control charts for multivariate processes. J. Amer. Statist. Assoc. 90, 1380-1388. Liu, R., Parelius, J.M. and Singh. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27, 783-858. Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proc. Nat. Acad. Sci. India 12, 49-55. Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press, New York. Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1, 327-332. Pollard, D. (1984). Convergence of stochastic processes. Springer Verlag, New York. A definition of depth for functional observations 38

Ramsay, J.O. and B.W. Silverman (1997). Functional Data Analysis. Springer Verlag, New York. Rousseeuw, P.J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. Wiley, New York. Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Small, C.G. (1990). A survey of multidimensional medians. International Statistitical Review 58, 263-277. Tukey, J. (1975). Mathematics and picturing data. In Proceedings of the 1975 International Congress of Mathematics 2, 523-531. Yeh, A. and Singh, K. (1997). Balanced confidence sets based on the Tukey depth. J. Roy. Statist. Soc. Ser. B 3, 639-652. A definition of depth for functional observations 39

Zuo, Y. and Serfling, R.J. (2000). General Notions of Statistical Depth Function, Ann. Statist. 28, 461-482. A definition of depth for functional observations 40