Uncovering Anomalous Usage of Medical Records via Social Network Analysis

Similar documents
Data Mining Data is logged every 3 hours 11 parameters : 1. Number of call 2. Correspondents and calls pattern 3. Call duration 4. Circle of friend 5.

Self-Driving Vehicles That (Fore) See

Heart Rate Prediction Based on Cycling Cadence Using Feedforward Neural Network

A Novel Approach to Predicting the Results of NBA Matches

Chapter 5: Methods and Philosophy of Statistical Process Control

Developing an Effective Quantification Method of Tongue Deviation Angle to Assess Stroke Patients

This file is part of the following reference:

Transposition Table, History Heuristic, and other Search Enhancements

Thursday, Nov 20, pm 2pm EB 3546 DIET MONITORING THROUGH BREATHING SIGNAL ANALYSIS USING WEARABLE SENSORS. Bo Dong. Advisor: Dr.

Review of transportation mode detection approaches based on smartphone data

Copyright by Turbomachinery Laboratory, Texas A&M University

Predicting Human Behavior from Public Cameras with Convolutional Neural Networks

CC-Log: Drastically Reducing Storage Requirements for Robots Using Classification and Compression

Electromyographic (EMG) Decomposition. Tutorial. Hamid R. Marateb, PhD; Kevin C. McGill, PhD

Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study

Replay using Recomposition: Alignment-Based Conformance Checking in the Large

THESE DAYS IT S HARD TO MISS the story that Americans spend

AIS data analysis for vessel behavior during strong currents and during encounters in the Botlek area in the Port of Rotterdam

Basketball data science

Impact of new imaging modalities on video surveillance and invasion of privacy

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Travel Behavior of Baby Boomers in Suburban Age Restricted Communities

Study on wind turbine arrangement for offshore wind farms

SPECTRAL CHARACTERISTICS OF FLUCTUATING WIND LOADS ON A SEPARATE TWIN-BOX DECK WITH CENTRAL SLOT

Legendre et al Appendices and Supplements, p. 1

Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study

Dynamic Cluster-Based Over-Demand Prediction in Bike Sharing Systems

Disability assessment using visual gait analysis

Marking Guidelines 2009 examination June series. Physics Investigative Skills Assignment (ISA) Q. General Certificate of Education

Walkable Communities and Adolescent Weight

The Seventh International Colloquium on Bluff Body Aerodynamics and Applications (BBAA7) Shanghai, China; September 2-6, 2012 Wind tunnel measurements

Visual Traffic Jam Analysis Based on Trajectory Data

Football Pass Prediction using Player Locations

Examining the Impact of Driving Style on the Predictability and Responsiveness of the Driver: Real-world and Simulator Analysis

ISOLATION OF NON-HYDROSTATIC REGIONS WITHIN A BASIN

YAN GU. Assistant Professor, University of Massachusetts Lowell. Frederick N. Andrews Fellowship, Graduate School, Purdue University ( )

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Author s Name Name of the Paper Session. Positioning Committee. Marine Technology Society. DYNAMIC POSITIONING CONFERENCE September 18-19, 2001

ZMP Trajectory Generation for Reduced Trunk Motions of Biped Robots

BICYCLE SPEED ANALYSIS FOR ASSESSMENT OF BICYCLIST PRE-COLLISION SYSTEM

Variables influencing lane changing behaviour of heavy vehicles

Analysis of Curling Team Strategy and Tactics using Curling Informatics

Spatio-temporal analysis of team sports Joachim Gudmundsson

Numerical simulation of three-dimensional flow field in three-line rollers and four-line rollers compact spinning systems using finite element method

Human Performance Evaluation

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities

Traffic Parameter Methods for Surrogate Safety Comparative Study of Three Non-Intrusive Sensor Technologies

WHITE PAPER A Framework for Cyber Threat Hunting

Technical Report. 5th Round Robin Test for Multi-Capillary Ventilation Calibration Standards (2016/2017)

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

Wind tunnel tests of a non-typical stadium roof

Design of Matched H-Plane Tee

Player Availability Rating (PAR) - A Tool for Quantifying Skater Performance for NHL General Managers

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Revenue Cycle Management and EHRs

Development of a Detailed Multi-Body Computational Model of an ATV to Address and Prevent Child Injuries. Chandra K. Thorbole, Ph.D.

Investigation of Texting on Young Drivers Behaviour by Means of Multivariate Copula and Gaussian Mixture Modelling

Analytical and Experimental Study of Parafoil Kite Force with Different Angle of Attack in ANSYS Environment for Wind Energy Generation

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Sensors & Transducers Published by IFSA Publishing, S. L., 2017

Automated Proactive Road Safety Analysis

An Application of Signal Detection Theory for Understanding Driver Behavior at Highway-Rail Grade Crossings

Predicting the development of the NBA playoffs. How much the regular season tells us about the playoff results.

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

FREE MOTION SIMULATION OF A SAILING YACHT IN UP-WIND CONDITION WITH ROUGH SEA

Impact of a Pilot Walking School Bus Intervention on Children s Pedestrian Safety Behaviors

Engineering Models for Vented Lean Hydrogen Deflagrations

Residents Liaison Meeting

Modelling Approaches and Results of the FHDO Biomedical Computer Science Group at ImageCLEF 2015 Medical Classification Task

Numerical simulation and analysis of aerodynamic drag on a subsonic train in evacuated tube transportation

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Local-to-Global Algorithms in Biology: Drosophila and Beyond (maybe...)

A New Approach in the GIS Bikeshed Analysis Considering of Topography, Street Connectivity, and Energy Consumption

Bicycle Safety Map System Based on Smartphone Aided Sensor Network

RARE CONDITIONS - AN IMPORTANT CAUSE OF FAILURES. Herbert Hecht SoHaR Incorporated Beverly Hills, California

Controlling Walking Behavior of Passive Dynamic Walker utilizing Passive Joint Compliance

Adaptability and Fault Tolerance

The latest from the World Health Organization meeting in October 2011 ICF updates

Modeling and simulation of multiple personal mobility. vehicles in pedestrian flows using personal space

Failure Detection in an Autonomous Underwater Vehicle

A CO 2 Waveform Simulator For Evaluation and Testing of Respiratory Gas Analyzers

The Safety Case. Structure of Safety Cases Safety Argument Notation

CALIBRATION OF THE PLATOON DISPERSION MODEL BY CONSIDERING THE IMPACT OF THE PERCENTAGE OF BUSES AT SIGNALIZED INTERSECTIONS

beestanbul RoboCup 3D Simulation League Team Description Paper 2012

Elimination of Oil Residual inside the Copper Pipe Using Ladder Technique

Factors Affecting Phonotaxis in Male House Crickets, Acheta domesticus

Early Skip Decision based on Merge Index of SKIP for HEVC Encoding

KOTARO SASAKI Curriculum Vitae

PREDICTING the outcomes of sporting events

Meta-Knowledge as an engine in Classifier Combination

The Incremental Evolution of Gaits for Hexapod Robots

SIMULATION OF THE FLOW FIELD CHARACTERISTICS OF TRANSIENT FLOW

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Looking at Spacings to Assess Streakiness

An Assessment of Quality in Underwater Archaeological Surveys Using Tape Measurements

CURRICULUM VITAE Hu Ding

Refining Clustering Evaluation Using Structure Indicators

Using Decision Tree Confidence Factors for Multiagent Control

Internet Technology Fundamentals. To use a passing score at the percentiles listed below:

Transcription:

TRUST Autumn 2011 Conference Uncovering of Medical Records via Social Network Analysis You Chen, Ph.D. Biomedical Informatics Dept., School of Medicine EECS Dept., School of Engineering November 2, 2011 (Joint work with Bradley Malin, Steve Nyemba, and Wen Zhang)

You Chen, 2011 2

You Chen, 2011 3

Two Typical Attacks Intruders have complete knowledge of the system and its policies (1) Anomalous users detection user level (2) Anomalous accesses detection access level Anomalous Access c a b a Intruders have little knowledge of the system and the anticipated behavior Anomalous users 4

Access Control Models Related Research Rolebased Situationbased Role Mining Do not capture the dynamic relationships among users in collaborative information systems Auditing Models Does not offer stability of access control model over time Community Anomaly Detection Specialized Anomaly Detection Machine Learning K-Nearest Neighbors Spectral Analysis High- Volume Users Case- Based Reasoning You Chen, 2011 5

Two general objects of health information system U(sers) S(ubjects) Accesses S 1 S 3 S 5 Behavioral Modeling You Chen, 2011 6

Where are We Going? User Level Anomaly Detection Community Anomaly Detection System (CADS) (ACM CODASPY 11) Access Level Anomaly Detection Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) You Chen, 2011 7

Social Networks are a Novel Approach to Discovery of Electronic Medical Record Misuse CADS: Leverages a global view of the network SNAD: A Local view of the network You Chen, 2011 8

Example Environments Electronic Health Records (EHR) Vanderbilt University Medical Center StarPanel Logs 6 months in 2006 Arbitrary Week 2,300 users 35,000 patient records 66,000 accesses You Chen, 2011 9

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Framework of CADS An Example of CADS Experimental Evaluation Limitation Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) You Chen, 2011 10

Community-Based Anomaly Detection (CADS) <user, subject, time> Nearest neighbor networks Communities of users You Chen, 2011 11

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Framework of CADS An Example of CADS Experimental Evaluation Limitation Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) You Chen, 2011 12

Bipartite graph Communities Derivation Communities Distance Measurement Nearest Neighbor Network Deviation Measurement Deviation Scores You Chen, 2011 13

How Do We Set k -NN? Conductance- a measure of community quality (Kannan et al) You Chen, 2011 14

Minimum conductance at k=6 Minimum conductance for value k k You Chen, 2011 15

Example 6-Nearest Neighbor Network (1 day of accesses) The average cluster coefficient for this network is 0.48, which is significantly larger than 0.001 for random networks Users exhibit collaborative behavior in the health information system You Chen, 2011 16

Measuring Deviation from k-nn Every user is assigned a radius d: the distance to his k th nearest neighbor Smaller the radius higher density in user s network You Chen, 2011 17

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Framework of CADS An Example of CADS Experimental Evaluation Limitation Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) You Chen, 2011 18

Experimental Design Datasets are not annotated for illicit behavior We simulated users in several settings to test: Sensitivity to number of records accessed Range from 1 to 1,000 Sensitivity to number of anomalous users simulated users correspond to 0.5% to 5% of total users Number of records accessed fixed to 5 Sensitivity to diversity Random number of users and records accessed You Chen, 2011 19

Deviation and Detection Rate Increases with Number of Subjects Accessed Deviation False Positive Rate Patients Accessed Patients Accessed You Chen, 2011 20

Detection Rate With Various Mix Rates of Real and Simulated Users True Positive Rate False Positive Rate You Chen, 2011 21

CADS Outperforms Competitors (mix rate = 0.5%) True Positive Rate False Positive Rate You Chen, 2011 22

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Framework of CADS An Example of CADS Experimental Evaluation Limitation Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) You Chen, 2011 23

Some Limitations Simulated users are indicative of misuse of the system but actual illicit behavior may be more directed. False positives are not necessarily false! (Adjudication by EHR privacy experts under way) Need to specialize tool to account for semantics of users and subjects User: {Role, Department, Residence} Patient: {Diagnosis, Procedure, Demographics, Residence} Anomalous users not anomalous accesses Need to account for insiders that deviate by only a couple of actions Work underway (about to be submitted), but it s detection is local, not global You Chen, 2011 24

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) Framework of SNAD An Example of CADS Experimental Evaluation Limitation You Chen, 2011 25

SNAD Framework You Chen, 2011 26

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) Framework of SNAD An Example of SNAD Experimental Evaluation Limitation You Chen, 2011 27

User Modeling You Chen, 2011 28

Access Network Construction You Chen, 2011 29

Access Network Measurement You Chen, 2011 30

Measuring Accesses for Changes in Network Similarity Access: u 1 ->s 3 You Chen, 2011 31

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) Framework of SNAD An Example of SNAD Experimental Evaluation Limitation You Chen, 2011 32

Experimental Design Datasets are not annotated for illicit behavior We simulated users in several settings to test: Sensitivity to number of subjects accessed Range from 1 to 1,00 Sensitivity to number of anomalous users Range from 2 to 20 Number of subjects accessed fixed to 5 Sensitivity to diversity Random number of users and subjects accessed You Chen, 2011 33

SNAD: Deviation Rate Increase with Number of Subjects Accessed AUC Number of Subjects the Intruder Accesses You Chen, 2011 34

SNAD: Deviation Rate Increases with Number of Intruders AUC Number of Intruders You Chen, 2011 35

SNAD Outperforms Competitors When the Number of Intruders & Accessed Subjects is Random True positive rate False positive rate You Chen, 2011 36

Where are We Going? User Level: Community Anomaly Detection System (CADS) (ACM CODASPY 11) Access Level: Specialized Network Anomaly Detection (SNAD) (IEEE ISI 11) Framework of SNAD An Example of SNAD Experimental Evaluation Limitation You Chen, 2011 37

Limitations SNAD has high performance in Vanderbilt s EHR system because organization is collaborative access networks have high network similarity SNAD may not be appropriate for large access network with low network similarity Absence of a user has little influence on the similarity. Similarity of network Size of network 38

Conclusions It is an effective way by using social network analysis to detect anomalous usages of electronic health records, such as CADS and SNAD Adding semantic information of users and subjects will make social network analysis be more understandable 39

References Y. Chen and B. Malin. Detection of anomalous insiders in collaborative environments via relational analysis of access logs. In Proceedings of the ACM Conference on Data and Application Security Security and Privacy, pages 63 74, 2011. (CADS) Y. Chen, S. Nyemba, W. Zhang, and B. Malin. Leveraging social networks to detect anomalous insider actions in collaborative environments. In Proceedings of the 9th IEEE Intelligence and Security Informatics, pages 119 124, 2011.(SNAD) Gallagher R, Sengupta S, Hripcsak G, Barrows R, Clayton P. An audit server for monitoring usage of clinical information systems. Proceedings of the AMIA Symposium. 1998: 1002. A. A. Boxwala, J. Kim, J. M. Grillo, and L. O. Machado. Using statistical and machine learning to help institutions detect suspicious access to electronic health records. Journal of the American Medical Informatics Association, 18:498 505, 2011. Y. Liao and V. R. Vemuri. Use of k-nearest neighbor classifier for intrusion detection. Computer Security. 2002; 21(5): 439-448. R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM, 51(3):497 515, 2004. Fabbri D, LeFevre K: Explanation-based auditing. In Proceedings of 38th International Conference on Very Large Data Bases 2012:to appear. M. Shyu, S. Chen, K. Sarinnapakorn, and L. Chang. A novel anomaly detection scheme based on principal component classifier. In IEEE Foundations and New Directions of Data Mining Workshop. 2003: 172-179. You Chen, 2011 40

Acknowledgements Erik Boczko, Ph.D., Ph.D. Josh Denny, M.D. Dario Giuse, Dr. Ing Bradley Malin, Ph.D. Vanderbilt Steve Nyemba, M.S. John Paulett, M.S. Jian Tian Wen Zhang, M.S. UIUC Carl Gunter, Ph.D. Igor Svecs David Liebovitz, M.D. Sanjay Mehotra, Ph.D. Northwestern Funding National Science Foundation CCF-0424422 (TRUST) CNS-0964063 National Institutes of Health R01LM010207 You Chen, 2011 41

Questions? Comments? you.chen@vanderbilt.edu Health Information Privacy Lab: http://www.hiplab.org/ You Chen, 2011 42

Number of accesses Access score SNAD assumes that access scores are approximately distributed around a well-centered mean. You Chen, 2011 43

Percentage of Accesses The correlation coefficient between real and Laplace distribution is 0.886 Access score You Chen, 2011 44