Machine Learning Application in Aviation Safety

Machine Learning Application in Aviation Safety Surface Safety Metric MOR Classification Presented to: By: Date: ART Firdu Bati, PhD, FAA September, 2018

Agenda Surface Safety Metric (SSM) development Text mining Gradient Boosting Machine (GBM) Bayesian networks Time Series Forecasting Surface MOR classification Text mining GBM 2

Surface Safety Metric (SSM) 3

Surface Metric: Background The primary goal of a surface safety metric is to gauge the safety performance of the airport environment. All relevant events are accounted for The SSM was developed because existing FAA Air Traffic Organization (ATO) runway safety metrics focus only on runway incident occurrences. Count of A&B events divided by operation count 4

Surface Metric: Background Incident occurrence-based metrics can be deceptive: Increased reporting gives the appearance of reduced safety Accidents have no effect The ultimate goal is to prevent injury and damage, not incidents SSM Goal: Develop a risk-based surface safety metric that includes accident, Runway Excursion (RE), Runway Incursion (RI), and Surface Incident (SI) data. 5

SSM: Approach Identify applicable accident and incident data using a classifier Use probabilistic modeling to assign risk weights to each possible accident or incident Primarily outcome-based Calculate individual event SSM scores Aggregate annual SSM scores Generate report to congress. 6

Surface Safety Metric (SSM) The SSM measures surface safety using all relevant accidents and incidents Assigns weight to each event based on proximity to fatality. Employs multiple advanced modeling techniques: Text mining to extract useful information from unstructured narratives Machine learning (GBM) to categorize accident types Bayes net to determine a weight for each event type Time series forecasting to derive appropriate metric target values 7

SSM (continued) Periodic Scoring Process Data are gathered from NTSB accident, ASIAS RE (8 sources), and AJI RI-SI data sources. NTSB events are categorized as Runway Excursions, Runway Collisions, or Taxiway Collisions All accidents and incidents are scored based on weights derived from the Bayesian network model Most activities in this process have been automated. 8

Applicable Event Collection The number of reported events has been increasing 9

Weight Calculation Considers accident type proximity to fatality Ensures weight ordering (Injury > Damage) Maps incident types to accidents types 10

Calculated Weights Commercial Weights Non-Commercial Weights Outcome Weight Outcome Weight INJURY Fatal 1.0000 Serious 0.6129 Minor 0.3902 INJURY Fatal 1.0000 Serious 0.5221 Minor 0.4209 DAMAGE Destroyed 0.1571 Substantial 0.1413 Minor 0.1335 DAMAGE Destroyed 0.1036 Substantial 0.0669 Minor 0.0657 Category A RI 0.0033 Category A RI 0.0026 INCIDENT Category B RI 0.0024 Category C RI 0.0004 Category D RI 0.0001 RE Incident 0.0002 INCIDENT Category B RI 0.0020 Category C RI 9.0E-5 Category D RI 8.0E-5 RE Incident 8.5E-6 Surface Incident 8.3E-5 Surface Incident 7.8E-5 11

Weight Application Accidents Penalty term - Number of people injured and airframes damaged multiplied by corresponding weights Credit term - Fraction of people/airframes with lesser injury/damage For each accident, add penalty terms and subtract credit terms Incidents Each incident assigned its incident type weight 12

Credit Penalty Correction Term Example: two accidents Accident A: All 5 occupants injured Accident B: 5 out of 200 occupants injured Without credit term: Both A & B: 5 * 0.75 = 3.75 With credit term: A: 5 * 0.75 = 3.75 B: 5 * 0.75 (195 / 200) * 0.75 = 3.02 13

Calculating Annual SSM Weights for commercial events and noncommercial events are summed for each fiscal year Commercial and Non-Commercial RSM then normalized by millions of operations Note: The risk levels for commercial and noncommercial operations have significant difference 14

Commercial RSM Over Time (Per Million Operations) *Risk decreasing, operations slightly decreased over time 15

Non-Commercial RSM Over Time (Per Million Operations) *Risk fluctuating, increasing in recent years, operations decreased over time 16

RSM Targets Time series forecasting is used to set target Commercial Time Series with Forecast Non-Commercial Time Series with Forecast 17

Surface Mandatory Occurrence Report (MOR) Classification 18

Surface MOR: Background The surface MOR classification task was borne out of a desire to reduce the amount of manual event review performed by the Runway Incursion Analysis Team (RIAT) SMEs and analysts in the Runway Safety office currently manually review approximately 2,000 MORs per year to classify surface events 19

Surface MOR: Classification Classification makes use of narratives of the MORs. The algorithm can classify 90% of these MORs with 95% accuracy. The rest will still require manual review, but still represents a significant reduction. 20

Current Surface MOR Classification 21

Enhanced Surface MOR Classification SI, RE, D Classifier Surface MOR A, B, C 22

Model Performance 23

Approach Use existing categories as ground truth Preprocess narratives using text mining Apply feature engineering (n-gram, TF_IDF) Fit a classifier (GBM) 24

Ground Truth Use 20 years of MOR data Use existing surface event classification but use MOR raw text Classified into SI, RE, RI (AB, C, D) Few structure fields 25

Preprocessing & Cleaning Remove tags, exclamations, punctuation, numbers, whitespaces etc. Convert to lower case: standardize across all text and get rid of case differences and convert the entire text to lower. Runway/runway, rwy/rwy 26

Preprocessing & Cleaning Remove stop words: a set of words which help in sentence construction and don't have any real information. Words such as a, an, the, they, where etc. Stemming: convert the terms into their root form. It helps capture the intent of terms precisely Words like veered, veering to the root word veer 27

Feature Engineering (n-grams) When two or more words occurs together, they may provide more information to the model One word is 1-gram runway two words make 2-gram depart runway three words make 3-gram depart short runway 28

Feature Engineering (TF-IDF) A weighting scheme based on terms importance Product of the two term-frequency and inverse-term-frequency Rarely occurring terms may carry more information than frequent ones R/Python has packages which can do these calculations easily 29

Gradient Boosting Machine (GBM) Single decision tree often too specific and unstable (high variance) Many models (usually weak ones) working together result in more accurate model Committee-like decision Common tree ensemble: Bagging Random Forest (RF) and Boosting *Employed stochastic gradient boosting for SSM (LightGBM) 30

Fitting Process 31

Backup Slides 32

SSM Assumptions Assumed the following hierarchy of accident/incident severity based on domain expert input: fatality, injury, aircraft damage, incidents. Incident to accident mapping assumption RIs Runway Collision SIs Taxiway Collision RE Incidents RE Accident Assumed the following criteria should be used to split the dataset by Commercial and Non-Commercial Part 121, 129, 135: Commercial Others: General Aviation / Non-Commercial Incident data are only available starting in 1997; however, the model used accident data since 1982. This is more conservative 33

Technical Approach Prepare National Transportation Safety Board (NTSB), RI-SI Database, and RE Database data. Assume the worst possible accident involves fatal injury. Assign weights to accidents and incidents based on relative distance to fatality. 34

Data Pre-Processing NTSB data selection Reports without phase codes relevant to the runway environment were removed Remaining data classified as runway collision, taxiway collision, runway excursion by training a text mining model and validating results Data merging RE and RI-SI data have already been reviewed by domain expects and categorized Some overlap between RE and NTSB datasets; duplicates were removed (NTSB records retained) 35

FY16 Dataset NTSB (FY81 FY16) Runway Collision Taxiway Collision Runway Excursions # of Events 324 285 9,520 RI-SI (FY98 FY16) A B C D SI # of Events 237 319 7,170 6,288 9,073 RE (FY12 FY16) Runway Excursions # of Events 1,241 36

Three Step Weighting Process 1. Weights are assigned to types of accidents based on proximity to a fatal accident. 2. Appropriate domain expert assumptions are used to reorder the weights: Injury Damage Incident without injury/damage 3. Incidents are assigned weights based on their corresponding accident types. 37

Information Gain as a Measure of Proximity HH FFFF = xx XX PP xx llllll 2 PP, (1) Marginal entropy of Fatal Accident HH FFFF AAcccc = PP FFFF ii HH FFFF ii AAcccc ii, (2) ii Conditional entropy of fatal accident II FFFF, AAAAAA = HH AAcccc HH(FFFF AAAAAA), (3a) Mutual Information (MI) between a fatality and any generic accident II FFFF, AAAAAA = ffff FFFF aacccc AAAAAA pp(ffff, aaaaaa)llllll 2 Mutual Information (MI) between a fatality and any generic accident pp(ffff,aaaaaa) pp ffff pp(aaaaaa), (3b) 38

Information Gain as a Measure of Bayesian Network employed Accounts for correlation between different outcomes Computes information gain Proximity 39

Weights Penalty & Credit Injuries Penalty and credit terms for injured and non-injured, respectively ww II = IIIIII CCCCCC NNNNNNIIIIIIIIIIII CCCCCC TTTTTTTTTTTTTTTTTTTTTTTT ww ii Damage Penalty and credit terms for damaged and non-damaged, respectively ww DD = DDDDDD CCCCCC NNNNNNNNNNNN CCCCCC NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ww dd Incidents Ratio of the weight of the corresponding accident type to frequency of incident ww IIIIII = ww AAAAAA nn IIIIII WWWWWWWWWW, WW AAAAAA : WW RRCC WW RRRR, WW TTTT WW SSII, WW RREE_AAAAAA WW RREE_IIIIII 40

Credit Penalty Correction Term Example Two accidents A: 5/5 injured B: 5/200 injured Without credit term, Both A & B: 5 *.75 = 3.75 With credit, A: (5*.75) = 3.75 B; (5*.75) (.975*.75) = 3.02 41

Application of Weights Two potential applications: Aggregation: sum of all undesired outcome weights Worst-outcome: weight of the worst outcome *The choice makes little difference in the relative risk profile primarily due to the limiting effect of the worstoutcome [adding small numbers to a large number] 42

Combined Risk & Event Count 3000 2500 2000 1500 1000 500 Event Count vs. Cumulative Risk 100 90 80 70 60 50 40 30 20 10 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 0 Combined Accident Count Combined Weight (MA) Combined Incident Count Poly. (Combined Weight (MA)) *Risk decreased, incident increased, accident relatively constant over time 43

Commercial Risk & Event Count *Risk decreased, incident increased, accident relatively constant over time 44

Non-Commercial Risk & Event Count *Risk decreased, incident increased, accident relatively constant over time 45

Commercial Risk & Ops (Per Million) 1.4 1.2 1 Normalized Commercial Risk & Operations 30 25 20 Millions of Operations Risk index 0.8 0.6 15 0.4 10 0.2 5 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Commercial Operations Commercial Normalized (MA) Poly. (Commercial Normalized (MA)) 0 *Risk decreased, Ops slightly decreased over time 46

Non-Commercial Risk & Ops (Per Million) Risk index 2.5 2 Normalized Non-Commercial Risk & Operations 1.5 30 25 1 20 50 45 40 35 Millions of Operations 15 0.5 10 5 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 0 Non Commercial Operations Non-Commercial Normalized (MA) Poly. (Non-Commercial Normalized (MA)) *Risk increased, Ops decreased over time 47

Commercial Target 95% confidence value of time series forecast: 0.5 48

Non-Commercial Target 95% confidence value of time series forecast: 2.6 Forecast value of time series: 1.96 49

Forecast Models Commercial Target 50

Forecast Models Non-Commercial Target 51

Feature Engineering (TF-IDF) tttt(tt, dd) = kk + (11 kk) ff tt,dd mmmmmm{ff tt,dd: tt εε dd} k: smoothing term, & max(.) reduces the bias toward longer documents NN iiiiii tt, DD = llllll 11 + ddεεεε: tttttt Measures how common or rare a term is in the corpus, N is # docs tttt_iiiiii tt, dd, DD = tttt tt, dd. iiiiii tt, DD Takes high value for terms with frequency high in a document and low in corpus 52

Gradient Boosting Machine 53

REGRESSION TREES Simple, non-parametric supervised learning Fits piecewise constant using recursive partitioning Tree Representation Partitions Prediction Surface 54

REGRESSION TREES TT xx; Θ = Minimize loss, MM mm=1 Θ = arg min Θ cc mm II(xx RR mm ), Θ = {RR mm, CC mm } 1 MM LL(yy ii, CC mm ) xx ii RR mm Find regions using greedy search & estimate value as average, cc mm = aaaaaa(yy ii xx ii RR mm ) 55

TREE ENSEMBLE Single decision tree often too specific and unstable (high variance) Many models (usually weak ones) working together result in more accurate model Common tree ensemble: Bagging Random Forest (RF) and Boosting *Employed stochastic gradient boosting for SSM 56

GRADIENT BOOSTED MODEL (GBM) ff xx is estimated by minimizing the expectation of a loss function, L(y, f(x)) ff xx = arg min ff xx EE yy xx [ LL(yy, ff xx ) xx] At each iteration, the algorithm determines the direction of the best fit to the data, negative gradient For square loss error, the algorithm corresponds to residual fitting 57

GBM ALGORITHM 1. Initialize ff xx tttt aa cccccccccccccccc 2. For each tree t, I. Compute negative gradient of the loss rr ii = (yy ii, ff xx ii ) ff(xx ii ) ff=ff tt 11 II. Fit a tree gg xx tttt rr ii from a random set resulting in m regions III. Choose a step size for each region as, λλ mm = arg min γγ xx ii RR mm LL(yy ii, ff xx ii + λλ) 3. Update: ff xx = ff xx + 11 MM λλ mm II(xx RR mm ) 58

GBM FOR ACCIDENT PREDICTION Loss function experimented, o o Gradient of squared error loss (yy ii ff(xx ii ) ) 22, yy ii ff xx ii, (dddddddddddddddd) gggggggggggggggg Gradient of Poisson error loss 22yy ii ff xx ii eeeeee(ff(xx ii )) yy ii eeeeee(ff(xx ii )) Both models resulted in a similar performance 59

ATO Safety Analytics Processes 60