Machine Learning Application in Aviation Safety

Size: px
Start display at page:

Download "Machine Learning Application in Aviation Safety"

Transcription

1 Machine Learning Application in Aviation Safety Surface Safety Metric MOR Classification Presented to: By: Date: ART Firdu Bati, PhD, FAA September, 2018

2 Agenda Surface Safety Metric (SSM) development Text mining Gradient Boosting Machine (GBM) Bayesian networks Time Series Forecasting Surface MOR classification Text mining GBM 2

3 Surface Safety Metric (SSM) 3

4 Surface Metric: Background The primary goal of a surface safety metric is to gauge the safety performance of the airport environment. All relevant events are accounted for The SSM was developed because existing FAA Air Traffic Organization (ATO) runway safety metrics focus only on runway incident occurrences. Count of A&B events divided by operation count 4

5 Surface Metric: Background Incident occurrence-based metrics can be deceptive: Increased reporting gives the appearance of reduced safety Accidents have no effect The ultimate goal is to prevent injury and damage, not incidents SSM Goal: Develop a risk-based surface safety metric that includes accident, Runway Excursion (RE), Runway Incursion (RI), and Surface Incident (SI) data. 5

6 SSM: Approach Identify applicable accident and incident data using a classifier Use probabilistic modeling to assign risk weights to each possible accident or incident Primarily outcome-based Calculate individual event SSM scores Aggregate annual SSM scores Generate report to congress. 6

7 Surface Safety Metric (SSM) The SSM measures surface safety using all relevant accidents and incidents Assigns weight to each event based on proximity to fatality. Employs multiple advanced modeling techniques: Text mining to extract useful information from unstructured narratives Machine learning (GBM) to categorize accident types Bayes net to determine a weight for each event type Time series forecasting to derive appropriate metric target values 7

8 SSM (continued) Periodic Scoring Process Data are gathered from NTSB accident, ASIAS RE (8 sources), and AJI RI-SI data sources. NTSB events are categorized as Runway Excursions, Runway Collisions, or Taxiway Collisions All accidents and incidents are scored based on weights derived from the Bayesian network model Most activities in this process have been automated. 8

9 Applicable Event Collection The number of reported events has been increasing 9

10 Weight Calculation Considers accident type proximity to fatality Ensures weight ordering (Injury > Damage) Maps incident types to accidents types 10

11 Calculated Weights Commercial Weights Non-Commercial Weights Outcome Weight Outcome Weight INJURY Fatal Serious Minor INJURY Fatal Serious Minor DAMAGE Destroyed Substantial Minor DAMAGE Destroyed Substantial Minor Category A RI Category A RI INCIDENT Category B RI Category C RI Category D RI RE Incident INCIDENT Category B RI Category C RI 9.0E-5 Category D RI 8.0E-5 RE Incident 8.5E-6 Surface Incident 8.3E-5 Surface Incident 7.8E-5 11

12 Weight Application Accidents Penalty term - Number of people injured and airframes damaged multiplied by corresponding weights Credit term - Fraction of people/airframes with lesser injury/damage For each accident, add penalty terms and subtract credit terms Incidents Each incident assigned its incident type weight 12

13 Credit Penalty Correction Term Example: two accidents Accident A: All 5 occupants injured Accident B: 5 out of 200 occupants injured Without credit term: Both A & B: 5 * 0.75 = 3.75 With credit term: A: 5 * 0.75 = 3.75 B: 5 * 0.75 (195 / 200) * 0.75 =

14 Calculating Annual SSM Weights for commercial events and noncommercial events are summed for each fiscal year Commercial and Non-Commercial RSM then normalized by millions of operations Note: The risk levels for commercial and noncommercial operations have significant difference 14

15 Commercial RSM Over Time (Per Million Operations) *Risk decreasing, operations slightly decreased over time 15

16 Non-Commercial RSM Over Time (Per Million Operations) *Risk fluctuating, increasing in recent years, operations decreased over time 16

17 RSM Targets Time series forecasting is used to set target Commercial Time Series with Forecast Non-Commercial Time Series with Forecast 17

18 Surface Mandatory Occurrence Report (MOR) Classification 18

19 Surface MOR: Background The surface MOR classification task was borne out of a desire to reduce the amount of manual event review performed by the Runway Incursion Analysis Team (RIAT) SMEs and analysts in the Runway Safety office currently manually review approximately 2,000 MORs per year to classify surface events 19

20 Surface MOR: Classification Classification makes use of narratives of the MORs. The algorithm can classify 90% of these MORs with 95% accuracy. The rest will still require manual review, but still represents a significant reduction. 20

21 Current Surface MOR Classification 21

22 Enhanced Surface MOR Classification SI, RE, D Classifier Surface MOR A, B, C 22

23 Model Performance 23

24 Approach Use existing categories as ground truth Preprocess narratives using text mining Apply feature engineering (n-gram, TF_IDF) Fit a classifier (GBM) 24

25 Ground Truth Use 20 years of MOR data Use existing surface event classification but use MOR raw text Classified into SI, RE, RI (AB, C, D) Few structure fields 25

26 Preprocessing & Cleaning Remove tags, exclamations, punctuation, numbers, whitespaces etc. Convert to lower case: standardize across all text and get rid of case differences and convert the entire text to lower. Runway/runway, rwy/rwy 26

27 Preprocessing & Cleaning Remove stop words: a set of words which help in sentence construction and don't have any real information. Words such as a, an, the, they, where etc. Stemming: convert the terms into their root form. It helps capture the intent of terms precisely Words like veered, veering to the root word veer 27

28 Feature Engineering (n-grams) When two or more words occurs together, they may provide more information to the model One word is 1-gram runway two words make 2-gram depart runway three words make 3-gram depart short runway 28

29 Feature Engineering (TF-IDF) A weighting scheme based on terms importance Product of the two term-frequency and inverse-term-frequency Rarely occurring terms may carry more information than frequent ones R/Python has packages which can do these calculations easily 29

30 Gradient Boosting Machine (GBM) Single decision tree often too specific and unstable (high variance) Many models (usually weak ones) working together result in more accurate model Committee-like decision Common tree ensemble: Bagging Random Forest (RF) and Boosting *Employed stochastic gradient boosting for SSM (LightGBM) 30

31 Fitting Process 31

32 Backup Slides 32

33 SSM Assumptions Assumed the following hierarchy of accident/incident severity based on domain expert input: fatality, injury, aircraft damage, incidents. Incident to accident mapping assumption RIs Runway Collision SIs Taxiway Collision RE Incidents RE Accident Assumed the following criteria should be used to split the dataset by Commercial and Non-Commercial Part 121, 129, 135: Commercial Others: General Aviation / Non-Commercial Incident data are only available starting in 1997; however, the model used accident data since This is more conservative 33

34 Technical Approach Prepare National Transportation Safety Board (NTSB), RI-SI Database, and RE Database data. Assume the worst possible accident involves fatal injury. Assign weights to accidents and incidents based on relative distance to fatality. 34

35 Data Pre-Processing NTSB data selection Reports without phase codes relevant to the runway environment were removed Remaining data classified as runway collision, taxiway collision, runway excursion by training a text mining model and validating results Data merging RE and RI-SI data have already been reviewed by domain expects and categorized Some overlap between RE and NTSB datasets; duplicates were removed (NTSB records retained) 35

36 FY16 Dataset NTSB (FY81 FY16) Runway Collision Taxiway Collision Runway Excursions # of Events ,520 RI-SI (FY98 FY16) A B C D SI # of Events ,170 6,288 9,073 RE (FY12 FY16) Runway Excursions # of Events 1,241 36

37 Three Step Weighting Process 1. Weights are assigned to types of accidents based on proximity to a fatal accident. 2. Appropriate domain expert assumptions are used to reorder the weights: Injury Damage Incident without injury/damage 3. Incidents are assigned weights based on their corresponding accident types. 37

38 Information Gain as a Measure of Proximity HH FFFF = xx XX PP xx llllll 2 PP, (1) Marginal entropy of Fatal Accident HH FFFF AAcccc = PP FFFF ii HH FFFF ii AAcccc ii, (2) ii Conditional entropy of fatal accident II FFFF, AAAAAA = HH AAcccc HH(FFFF AAAAAA), (3a) Mutual Information (MI) between a fatality and any generic accident II FFFF, AAAAAA = ffff FFFF aacccc AAAAAA pp(ffff, aaaaaa)llllll 2 Mutual Information (MI) between a fatality and any generic accident pp(ffff,aaaaaa) pp ffff pp(aaaaaa), (3b) 38

39 Information Gain as a Measure of Bayesian Network employed Accounts for correlation between different outcomes Computes information gain Proximity 39

40 Weights Penalty & Credit Injuries Penalty and credit terms for injured and non-injured, respectively ww II = IIIIII CCCCCC NNNNNNIIIIIIIIIIII CCCCCC TTTTTTTTTTTTTTTTTTTTTTTT ww ii Damage Penalty and credit terms for damaged and non-damaged, respectively ww DD = DDDDDD CCCCCC NNNNNNNNNNNN CCCCCC NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ww dd Incidents Ratio of the weight of the corresponding accident type to frequency of incident ww IIIIII = ww AAAAAA nn IIIIII WWWWWWWWWW, WW AAAAAA : WW RRCC WW RRRR, WW TTTT WW SSII, WW RREE_AAAAAA WW RREE_IIIIII 40

41 Credit Penalty Correction Term Example Two accidents A: 5/5 injured B: 5/200 injured Without credit term, Both A & B: 5 *.75 = 3.75 With credit, A: (5*.75) = 3.75 B; (5*.75) (.975*.75) =

42 Application of Weights Two potential applications: Aggregation: sum of all undesired outcome weights Worst-outcome: weight of the worst outcome *The choice makes little difference in the relative risk profile primarily due to the limiting effect of the worstoutcome [adding small numbers to a large number] 42

43 Combined Risk & Event Count Event Count vs. Cumulative Risk Combined Accident Count Combined Weight (MA) Combined Incident Count Poly. (Combined Weight (MA)) *Risk decreased, incident increased, accident relatively constant over time 43

44 Commercial Risk & Event Count *Risk decreased, incident increased, accident relatively constant over time 44

45 Non-Commercial Risk & Event Count *Risk decreased, incident increased, accident relatively constant over time 45

46 Commercial Risk & Ops (Per Million) Normalized Commercial Risk & Operations Millions of Operations Risk index Commercial Operations Commercial Normalized (MA) Poly. (Commercial Normalized (MA)) 0 *Risk decreased, Ops slightly decreased over time 46

47 Non-Commercial Risk & Ops (Per Million) Risk index Normalized Non-Commercial Risk & Operations Millions of Operations Non Commercial Operations Non-Commercial Normalized (MA) Poly. (Non-Commercial Normalized (MA)) *Risk increased, Ops decreased over time 47

48 Commercial Target 95% confidence value of time series forecast:

49 Non-Commercial Target 95% confidence value of time series forecast: 2.6 Forecast value of time series:

50 Forecast Models Commercial Target 50

51 Forecast Models Non-Commercial Target 51

52 Feature Engineering (TF-IDF) tttt(tt, dd) = kk + (11 kk) ff tt,dd mmmmmm{ff tt,dd: tt εε dd} k: smoothing term, & max(.) reduces the bias toward longer documents NN iiiiii tt, DD = llllll 11 + ddεεεε: tttttt Measures how common or rare a term is in the corpus, N is # docs tttt_iiiiii tt, dd, DD = tttt tt, dd. iiiiii tt, DD Takes high value for terms with frequency high in a document and low in corpus 52

53 Gradient Boosting Machine 53

54 REGRESSION TREES Simple, non-parametric supervised learning Fits piecewise constant using recursive partitioning Tree Representation Partitions Prediction Surface 54

55 REGRESSION TREES TT xx; Θ = Minimize loss, MM mm=1 Θ = arg min Θ cc mm II(xx RR mm ), Θ = {RR mm, CC mm } 1 MM LL(yy ii, CC mm ) xx ii RR mm Find regions using greedy search & estimate value as average, cc mm = aaaaaa(yy ii xx ii RR mm ) 55

56 TREE ENSEMBLE Single decision tree often too specific and unstable (high variance) Many models (usually weak ones) working together result in more accurate model Common tree ensemble: Bagging Random Forest (RF) and Boosting *Employed stochastic gradient boosting for SSM 56

57 GRADIENT BOOSTED MODEL (GBM) ff xx is estimated by minimizing the expectation of a loss function, L(y, f(x)) ff xx = arg min ff xx EE yy xx [ LL(yy, ff xx ) xx] At each iteration, the algorithm determines the direction of the best fit to the data, negative gradient For square loss error, the algorithm corresponds to residual fitting 57

58 GBM ALGORITHM 1. Initialize ff xx tttt aa cccccccccccccccc 2. For each tree t, I. Compute negative gradient of the loss rr ii = (yy ii, ff xx ii ) ff(xx ii ) ff=ff tt 11 II. Fit a tree gg xx tttt rr ii from a random set resulting in m regions III. Choose a step size for each region as, λλ mm = arg min γγ xx ii RR mm LL(yy ii, ff xx ii + λλ) 3. Update: ff xx = ff xx + 11 MM λλ mm II(xx RR mm ) 58

59 GBM FOR ACCIDENT PREDICTION Loss function experimented, o o Gradient of squared error loss (yy ii ff(xx ii ) ) 22, yy ii ff xx ii, (dddddddddddddddd) gggggggggggggggg Gradient of Poisson error loss 22yy ii ff xx ii eeeeee(ff(xx ii )) yy ii eeeeee(ff(xx ii )) Both models resulted in a similar performance 59

60 ATO Safety Analytics Processes 60

Lecture 5. Optimisation. Regularisation

Lecture 5. Optimisation. Regularisation Lecture 5. Optimisation. Regularisation COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Iterative optimisation Loss functions Coordinate

More information

knn & Naïve Bayes Hongning Wang

knn & Naïve Bayes Hongning Wang knn & Naïve Bayes Hongning Wang CS@UVa Today s lecture Instance-based classifiers k nearest neighbors Non-parametric learning algorithm Model-based classifiers Naïve Bayes classifier A generative model

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: Naïve Bayes Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Announcements Course TA: Hao Xiong Office hours: Friday 2pm-4pm in ECSS2.104A1 First homework

More information

Special Topics: Data Science

Special Topics: Data Science Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics

More information

Estimating the Probability of Winning an NFL Game Using Random Forests

Estimating the Probability of Winning an NFL Game Using Random Forests Estimating the Probability of Winning an NFL Game Using Random Forests Dale Zimmerman February 17, 2017 2 Brian Burke s NFL win probability metric May be found at www.advancednflstats.com, but the site

More information

Logistic Regression. Hongning Wang

Logistic Regression. Hongning Wang Logistic Regression Hongning Wang CS@UVa Today s lecture Logistic regression model A discriminative classification model Two different perspectives to derive the model Parameter estimation CS@UVa CS 6501:

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning

More information

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Attacking and defending neural networks HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Outline Background Attacking methods Defending methods 2 AI

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Linear Regression, Logistic Regression, and GLMs Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 About WWW2017 Conference 2 Turing Award Winner Sir Tim Berners-Lee 3

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 3: Vector Data: Logistic Regression Instructor: Yizhou Sun yzsun@cs.ucla.edu October 9, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6 ECO 745: Theory of International Economics Jack Rossbach Fall 2015 - Lecture 6 Review We ve covered several models of trade, but the empirics have been mixed Difficulties identifying goods with a technological

More information

Safety assessments for Aerodromes (Chapter 3 of the PANS-Aerodromes, 1 st ed)

Safety assessments for Aerodromes (Chapter 3 of the PANS-Aerodromes, 1 st ed) Safety assessments for Aerodromes (Chapter 3 of the PANS-Aerodromes, 1 st ed) ICAO MID Seminar on Aerodrome Operational Procedures (PANS-Aerodromes) Cairo, November 2017 Avner Shilo, Technical officer

More information

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary Pre-Kindergarten 2017 Summer Packet Robert F Woodall Elementary In the fall, on your child s testing day, please bring this packet back for a special reward that will be awarded to your child for completion

More information

Addition and Subtraction of Rational Expressions

Addition and Subtraction of Rational Expressions RT.3 Addition and Subtraction of Rational Expressions Many real-world applications involve adding or subtracting algebraic fractions. Similarly as in the case of common fractions, to add or subtract algebraic

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

Aeronautical studies and Safety Assessment

Aeronautical studies and Safety Assessment Aerodrome Safeguarding Workshop Cairo, 4 6 Dec. 2017 Aeronautical studies and Safety Assessment Nawal A. Abdel Hady ICAO MID Regional Office, Aerodrome and Ground Aids (AGA) Expert References ICAO SARPS

More information

This file is part of the following reference:

This file is part of the following reference: This file is part of the following reference: Hancock, Timothy Peter (2006) Multivariate consensus trees: tree-based clustering and profiling for mixed data types. PhD thesis, James Cook University. Access

More information

Lecture 10. Support Vector Machines (cont.)

Lecture 10. Support Vector Machines (cont.) Lecture 10. Support Vector Machines (cont.) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Soft margin SVM Intuition and problem

More information

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition Conservation of Energy Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 Different Types of Force, regarding the Work they do. gravity friction 2 Conservative Forces BB WW cccccccc

More information

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments.

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments. Applications of Bayesian Inference and Simulation in Professional Golf 1 Supervised by Associate Professor Anthony Bedford 1 1 RMIT University, Melbourne, Australia This report details a summary of my

More information

What is Restrained and Unrestrained Pipes and what is the Strength Criteria

What is Restrained and Unrestrained Pipes and what is the Strength Criteria What is Restrained and Unrestrained Pipes and what is the Strength Criteria Alex Matveev, September 11, 2018 About author: Alex Matveev is one of the authors of pipe stress analysis codes GOST 32388-2013

More information

Imperfectly Shared Randomness in Communication

Imperfectly Shared Randomness in Communication Imperfectly Shared Randomness in Communication Madhu Sudan Harvard Joint work with Clément Canonne (Columbia), Venkatesan Guruswami (CMU) and Raghu Meka (UCLA). 11/16/2016 UofT: ISR in Communication 1

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base

More information

Evaluating and Classifying NBA Free Agents

Evaluating and Classifying NBA Free Agents Evaluating and Classifying NBA Free Agents Shanwei Yan In this project, I applied machine learning techniques to perform multiclass classification on free agents by using game statistics, which is useful

More information

Human Performance Evaluation

Human Performance Evaluation Human Performance Evaluation Minh Nguyen, Liyue Fan, Luciano Nocera, Cyrus Shahabi minhnngu@usc.edu --O-- Integrated Media Systems Center University of Southern California 1 2 Motivating Application 8.2

More information

Simplifying Radical Expressions and the Distance Formula

Simplifying Radical Expressions and the Distance Formula 1 RD. Simplifying Radical Expressions and the Distance Formula In the previous section, we simplified some radical expressions by replacing radical signs with rational exponents, applying the rules of

More information

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections Todd Knox Center for Transportation Research and Education Iowa State University 2901 South Loop Drive, Suite 3100

More information

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS by Sushant Murdeshwar A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science

More information

Physical Design of CMOS Integrated Circuits

Physical Design of CMOS Integrated Circuits Physical Design of CMOS Integrated Circuits Dae Hyun Kim EECS Washington State University References John P. Uyemura, Introduction to VLSI Circuits and Systems, 2002. Chapter 5 Goal Understand how to physically

More information

Predicting the Total Number of Points Scored in NFL Games

Predicting the Total Number of Points Scored in NFL Games Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football

More information

Operations on Radical Expressions; Rationalization of Denominators

Operations on Radical Expressions; Rationalization of Denominators 0 RD. 1 2 2 2 2 2 2 2 Operations on Radical Expressions; Rationalization of Denominators Unlike operations on fractions or decimals, sums and differences of many radicals cannot be simplified. For instance,

More information

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March

More information

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Bayesian Optimized Random Forest for Movement Classification with Smartphones Bayesian Optimized Random Forest for Movement Classification with Smartphones 1 2 3 4 Anonymous Author(s) Affiliation Address email 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

More information

Federal Aviation Administration Safety & Human Factors Analysis of a Wake Vortex Mitigation Display System

Federal Aviation Administration Safety & Human Factors Analysis of a Wake Vortex Mitigation Display System Safety & Human Factors Analysis of a Wake Vortex Mitigation Display System Presented to: EUROCONTROL Safety R&D Seminar By: Dino Piccione Date: October 23, 2008 Project Objectives Forge a link between

More information

ISyE 6414 Regression Analysis

ISyE 6414 Regression Analysis ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable

More information

Marine Risk Assessment

Marine Risk Assessment Marine Risk Assessment Waraporn Srimoon (B.Sc., M.Sc.).) 10 December 2007 What is Risk assessment? Risk assessment is a review as to acceptability of risk based on comparison with risk standards or criteria,

More information

Projecting Three-Point Percentages for the NBA Draft

Projecting Three-Point Percentages for the NBA Draft Projecting Three-Point Percentages for the NBA Draft Hilary Sun hsun3@stanford.edu Jerold Yu jeroldyu@stanford.edu December 16, 2017 Roland Centeno rcenteno@stanford.edu 1 Introduction As NBA teams have

More information

Deconstructing Data Science

Deconstructing Data Science Deconstructing Data Science David Bamman, UC Berkele Info 29 Lecture 4: Regression overview Feb 1, 216 Regression A mapping from input data (drawn from instance space ) to a point in R (R = the set of

More information

Product Decomposition in Supply Chain Planning

Product Decomposition in Supply Chain Planning Mario R. Eden, Marianthi Ierapetritou and Gavin P. Towler (Editors) Proceedings of the 13 th International Symposium on Process Systems Engineering PSE 2018 July 1-5, 2018, San Diego, California, USA 2018

More information

Workshop to Generate Guidelines For the Implementation of: 1 - Step 1 of State Safety Program (SSP) and 2 - Phases 1 & 2 of ICAO SMS

Workshop to Generate Guidelines For the Implementation of: 1 - Step 1 of State Safety Program (SSP) and 2 - Phases 1 & 2 of ICAO SMS Workshop to Generate Guidelines For the Implementation of: 1 - Step 1 of State Safety Program (SSP) and 2 - Phases 1 & 2 of ICAO SMS SMS Peligrando, Arriesgando y Midiendo Dr. S. Hautequest Cardoso, Ph.D.

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

Deconstructing Data Science

Deconstructing Data Science Deconstructing Data Science David Bamman, UC Berkele Info 29 Lecture 4: Regression overview Jan 26, 217 Regression A mapping from input data (drawn from instance space ) to a point in R (R = the set of

More information

TSP at isolated intersections: Some advances under simulation environment

TSP at isolated intersections: Some advances under simulation environment TSP at isolated intersections: Some advances under simulation environment Zhengyao Yu Vikash V. Gayah Eleni Christofa TESC 2018 December 5, 2018 Overview Motivation Problem introduction Assumptions Formation

More information

A computer program that improves its performance at some task through experience.

A computer program that improves its performance at some task through experience. 1 A computer program that improves its performance at some task through experience. 2 Example: Learn to Diagnose Patients T: Diagnose tumors from images P: Percent of patients correctly diagnosed E: Pre

More information

Predicting NBA Shots

Predicting NBA Shots Predicting NBA Shots Brett Meehan Stanford University https://github.com/brettmeehan/cs229 Final Project bmeehan2@stanford.edu Abstract This paper examines the application of various machine learning algorithms

More information

Modeling Approaches to Increase the Efficiency of Clear-Point- Based Solubility Characterization

Modeling Approaches to Increase the Efficiency of Clear-Point- Based Solubility Characterization Modeling Approaches to Increase the Efficiency of Clear-Point- Based Solubility Characterization Paul Larsen, Dallin Whitaker Crop Protection Product Design & Process R&D OCTOBER 4, 2018 TECHNOBIS CRYSTALLI

More information

At each type of conflict location, the risk is affected by certain parameters:

At each type of conflict location, the risk is affected by certain parameters: TN001 April 2016 The separated cycleway options tool (SCOT) was developed to partially address some of the gaps identified in Stage 1 of the Cycling Network Guidance project relating to separated cycleways.

More information

ENHANCED PARKWAY STUDY: PHASE 2 CONTINUOUS FLOW INTERSECTIONS. Final Report

ENHANCED PARKWAY STUDY: PHASE 2 CONTINUOUS FLOW INTERSECTIONS. Final Report Preparedby: ENHANCED PARKWAY STUDY: PHASE 2 CONTINUOUS FLOW INTERSECTIONS Final Report Prepared for Maricopa County Department of Transportation Prepared by TABLE OF CONTENTS Page EXECUTIVE SUMMARY ES-1

More information

HSIS. Association of Selected Intersection Factors With Red-Light-Running Crashes. State Databases Used SUMMARY REPORT

HSIS. Association of Selected Intersection Factors With Red-Light-Running Crashes. State Databases Used SUMMARY REPORT HSIS HIGHWAY SAFETY INFORMATION SYSTEM The Highway Safety Information Systems (HSIS) is a multi-state safety data base that contains accident, roadway inventory, and traffic volume data for a select group

More information

Communication Amid Uncertainty

Communication Amid Uncertainty Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan

More information

Using Spatio-Temporal Data To Create A Shot Probability Model

Using Spatio-Temporal Data To Create A Shot Probability Model Using Spatio-Temporal Data To Create A Shot Probability Model Eli Shayer, Ankit Goyal, Younes Bensouda Mourri June 2, 2016 1 Introduction Basketball is an invasion sport, which means that players move

More information

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings Michelle Yeh and Jordan Multer United States Department of Transportation Volpe National Transportation

More information

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) 1 HSR - University of Applied Sciences of Eastern Switzerland Institute

More information

Job Description World Under-24 Ultimate Championships Tournament Director

Job Description World Under-24 Ultimate Championships Tournament Director Job Description World Under-24 Ultimate Championships Tournament Director Summary The Tournament Director (TD) shall be responsible to the AFDA for all matters that concern the planning, organization,

More information

Cycling Volume Estimation Methods for Safety Analysis

Cycling Volume Estimation Methods for Safety Analysis Cycling Volume Estimation Methods for Safety Analysis XI ICTCT extra Workshop in Vancouver, Canada Session: Methods and Simulation Date: March, 01 The Highway Safety Manual (HSM) documents many safety

More information

Collision Estimation and Cost Calculation

Collision Estimation and Cost Calculation Collision Estimation and Cost Calculation Table of Contents Introduction... 3 Rates Based Method... 4 20-Year Collision Cost Procedure... 4 Safety Performance Function Method... 7 Important notes when

More information

Tie Breaking Procedure

Tie Breaking Procedure Ohio Youth Basketball Tie Breaking Procedure The higher seeded team when two teams have the same record after completion of pool play will be determined by the winner of their head to head competition.

More information

Operational Risk Management: Preventive vs. Corrective Control

Operational Risk Management: Preventive vs. Corrective Control Operational Risk Management: Preventive vs. Corrective Control Yuqian Xu (UIUC) July 2018 Joint Work with Lingjiong Zhu and Michael Pinedo 1 Research Questions How to manage operational risk? How does

More information

Severity Indices for Motorcyclist Collisions with Roadside Hazards and Barriers

Severity Indices for Motorcyclist Collisions with Roadside Hazards and Barriers Severity Indices for Motorcyclist Collisions with Roadside Hazards and Barriers, School of Aviation Mike Bambach Rebecca Mitchell Raphael Grzebieta School of Aviation UNSW Background In Australia, 39%

More information

Risk Analysis Process Tool for Surface Loss of Separation Events

Risk Analysis Process Tool for Surface Loss of Separation Events Eleventh USA/Europe Air Traffic Management Research and Development Seminar (ATM2015) Risk Analysis Process Tool for Surface Loss of Separation Events Eric B. Chang The MITRE Corporation Center for Advanced

More information

NATIONAL FEDERATION RULES B. National Federation Rules Apply with the following TOP GUN EXCEPTIONS

NATIONAL FEDERATION RULES B. National Federation Rules Apply with the following TOP GUN EXCEPTIONS TOP GUN COACH PITCH RULES 8 & Girls Division Revised January 11, 2018 AGE CUT OFF A. Age 8 & under. Cut off date is January 1st. Player may not turn 9 before January 1 st. Please have Birth Certificates

More information

Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling and Michael L. Littman and Anthony R. Cassandra (1998). Planning and Acting in Partially Observable Stochastic Domains,

More information

Combining Experimental and Non-Experimental Design in Causal Inference

Combining Experimental and Non-Experimental Design in Causal Inference Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017 A Tribute to Don Design trumps

More information

RELATIONSHIP BETWEEN CONGESTION AND TRAFFIC ACCIDENTS ON EXPRESSWAYS AN INVESTIGATION WITH BAYESIAN BELIEF NETWORKS

RELATIONSHIP BETWEEN CONGESTION AND TRAFFIC ACCIDENTS ON EXPRESSWAYS AN INVESTIGATION WITH BAYESIAN BELIEF NETWORKS RELATIONSHIP BETWEEN CONGESTION AND TRAIC ACCIDENTS ON EXPRESSWAYS AN INESTIGATION WITH BAYESIAN BELIEF NETWORKS By Charitha Dias**, Marc Miska***, Masao Kuwahara****, and Hiroshi Warita***** 1. Introduction

More information

Vision Zero High Injury Network Methodology

Vision Zero High Injury Network Methodology Vision Zero High Injury Network Methodology DATA SETS USED: 1. Reportable crashes in Philadelphia from 2012-2016, available as open data through PennDOT 2. Street Centerline geographic layer, maintained

More information

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their

More information

At intersections where RLR collisions decreased and rear end collisions increased, there was an overall 95.5% increase in collision severity.

At intersections where RLR collisions decreased and rear end collisions increased, there was an overall 95.5% increase in collision severity. 2. Our analysis will show that even after giving more weight to more serious collisions, there was still an overall increase in the severity of collisions at red light camera intersections. At intersections

More information

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing www.dlr.de Chart 1 > SESARInno > Fürstenau RTOFramerate> 2012-11-30 Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

More information

Copy of my report. Why am I giving this talk. Overview. State highway network

Copy of my report. Why am I giving this talk. Overview. State highway network Road Surface characteristics and traffic accident rates on New Zealand s state highway network Robert Davies Statistics Research Associates http://www.statsresearch.co.nz Copy of my report There is a copy

More information

Communication Amid Uncertainty

Communication Amid Uncertainty Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan

More information

Functions of Random Variables & Expectation, Mean and Variance

Functions of Random Variables & Expectation, Mean and Variance Functions of Random Variables & Expectation, Mean and Variance Kuan-Yu Chen ( 陳冠宇 ) @ TR-409, NTUST Functions of Random Variables 1 Given a random variables XX, one may generate other random variables

More information

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com Naïve Bayes These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides

More information

Predicting Horse Racing Results with Machine Learning

Predicting Horse Racing Results with Machine Learning Predicting Horse Racing Results with Machine Learning LYU 1703 LIU YIDE 1155062194 Supervisor: Professor Michael R. Lyu Outline Recap of last semester Object of this semester Data Preparation Set to sequence

More information

Numerical Model to Simulate Drift Trajectories of Large Vessels. Simon Mortensen, HoD Marine DHI Australia

Numerical Model to Simulate Drift Trajectories of Large Vessels. Simon Mortensen, HoD Marine DHI Australia Numerical Model to Simulate Drift Trajectories of Large Vessels Simon Mortensen, HoD Marine DHI Australia Conceptual framework - multi-layered risk estimation Layer 1 (2011): Ship specific risk (proxy

More information

A Novel Approach to Predicting the Results of NBA Matches

A Novel Approach to Predicting the Results of NBA Matches A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents

More information

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can

More information

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium Bahadorreza Ofoghi 1,2, John Zeleznikow 1, Clare MacMahon 1,andDanDwyer 2 1 Victoria University, Melbourne VIC 3000, Australia

More information

New Airfield Risk Assessment / Categorisation

New Airfield Risk Assessment / Categorisation New Airfield Risk Assessment / Categorisation Airfield Risk Assessment Prior to commencing operations to a new airfield, airfield risk assessment and categorisation will take place. For continued operations

More information

Open Research Online The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs Developing an intelligent table tennis umpiring system Conference or Workshop Item How to cite:

More information

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected

More information

Confidence Interval Notes Calculating Confidence Intervals

Confidence Interval Notes Calculating Confidence Intervals Confidence Interval Notes Calculating Confidence Intervals Calculating One-Population Mean Confidence Intervals for Quantitative Data It is always best to use a computer program to make these calculations,

More information

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Analysis of Variance. Copyright 2014 Pearson Education, Inc. Analysis of Variance 12-1 Learning Outcomes Outcome 1. Understand the basic logic of analysis of variance. Outcome 2. Perform a hypothesis test for a single-factor design using analysis of variance manually

More information

POSSIBILITIES OF TRAFFIC ACCIDENTS AND RISK CRASH EVALUATION

POSSIBILITIES OF TRAFFIC ACCIDENTS AND RISK CRASH EVALUATION J. Stodola POSSIBILITIES OF TRAFFI AIDETS AD RISK RASH EVALUATIO R&RATA # 2 (Vol.1) 2008, June POSSIBILITIES OF TRAFFI AIDETS AD RISK RASH EVALUATIO Stodola Jiri University of Defense, Faculty of Military

More information

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Predicting the use of the Sacrifice Bunt in Major League Baseball Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Understanding the Data Data from the St. Louis Cardinals Sig Mejdal, Senior Quantitative

More information

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING A N D R E W L E V A N D O S K I A N D J O N A T H A N L O B O The Ringer/Getty Images THE TOURNAMENT MARCH MADNESS 68 teams (4 play-in games)

More information

DATA MINING SAMPLE RESEARCH: ACTIVITY RECOGNITION CLASSIFICATION IN ACTION

DATA MINING SAMPLE RESEARCH: ACTIVITY RECOGNITION CLASSIFICATION IN ACTION DATA MINING SAMPLE RESEARCH: ACTIVITY RECOGNITION CLASSIFICATION IN ACTION 1 Mobile Activity Recognition Mobile devices like smartphones and smartwatches have many sensors Some sensors measure motion Tri-axial

More information

Introduction to Genetics

Introduction to Genetics Name: Introduction to Genetics Keystone Assessment Anchor: BIO.B.2.1.1: Describe and/or predict observed patterns of inheritance (i.e. dominant, recessive, co-dominance, incomplete dominance, sex-linked,

More information

B. AA228/CS238 Component

B. AA228/CS238 Component Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,

More information

Application of Bayesian Networks to Shopping Assistance

Application of Bayesian Networks to Shopping Assistance Application of Bayesian Networks to Shopping Assistance Yang Xiang, Chenwen Ye, and Deborah Ann Stacey University of Guelph, CANADA Abstract. We develop an on-line shopping assistant that can help a e-shopper

More information

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016 Support Vector Machines: Optimization of Decision Making Christopher Katinas March 10, 2016 Overview Background of Support Vector Machines Segregation Functions/Problem Statement Methodology Training/Testing

More information

Chapter 6. Analysis of the framework with FARS Dataset

Chapter 6. Analysis of the framework with FARS Dataset Chapter 6 Analysis of the framework with FARS Dataset. Having represented each phase of the structure with the aid of different data sets at each stage, to support the efficacy of the proposed framework

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

Machine Learning an American Pastime

Machine Learning an American Pastime Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the

More information

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com Naïve Bayes These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely available

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Jason Corso SUNY at Buffalo 12 January 2009 J. Corso (SUNY at Buffalo) Introduction to Pattern Recognition 12 January 2009 1 / 28 Pattern Recognition By Example Example:

More information

Recommendations for the Risk Assessment of Buffer Stops and End Impact Walls

Recommendations for the Risk Assessment of Buffer Stops and End Impact Walls Recommendations for the Risk Assessment of Buffer Stops and End Synopsis This document gives details of a recommended method which, if followed, would meet the requirements of section 11, Buffer Stops

More information

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13 Admin Assignment 2 This class will make you a better programmer! How did it go? How much time did you spend? FEATURES David Kauchak CS 451 Fall 2013 Assignment 3 out Implement perceptron variants See how

More information