Replay using Recomposition: Alignment-Based Conformance Checking in the Large

Similar documents
Decomposing Replay Problems: A Case Study

APPLICATION OF TOOL SCIENCE TECHNIQUES TO IMPROVE TOOL EFFICIENCY FOR A DRY ETCH CLUSTER TOOL. Robert Havey Lixin Wang DongJin Kim

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

A Novel Approach to Predicting the Results of NBA Matches

ITAndroids 2D Soccer Simulation Team Description 2016

CAAD CTF 2018 Rules June 21, 2018 Version 1.1

EFFICIENCY OF TRIPLE LEFT-TURN LANES AT SIGNALIZED INTERSECTIONS

A Nomogram Of Performances In Endurance Running Based On Logarithmic Model Of Péronnet-Thibault

RUGBY is a dynamic, evasive, and highly possessionoriented

Basketball field goal percentage prediction model research and application based on BP neural network

Product Decomposition in Supply Chain Planning

AN AUTONOMOUS DRIVER MODEL FOR THE OVERTAKING MANEUVER FOR USE IN MICROSCOPIC TRAFFIC SIMULATION

Optimizing Cyclist Parking in a Closed System

EE 364B: Wind Farm Layout Optimization via Sequential Convex Programming

Electromyographic (EMG) Decomposition. Tutorial. Hamid R. Marateb, PhD; Kevin C. McGill, PhD

OIL AND GAS INDUSTRY

Control-Flow Pattern Based Transformation from UML Activity Diagram to YAWL

A IMPROVED VOGEL S APPROXIMATIO METHOD FOR THE TRA SPORTATIO PROBLEM. Serdar Korukoğlu 1 and Serkan Ballı 2.

Post-Placement Functional Decomposition for FPGAs

Volume-to-Capacity Estimation of Signalized Road Networks for Metropolitan Transportation Planning

SHIMADZU LC-10/20 PUMP

2600T Series Pressure Transmitters Plugged Impulse Line Detection Diagnostic. Pressure Measurement Engineered solutions for all applications

Fig. 2 Superior operation of the proposed intelligent wind turbine generator. Fig.3 Experimental apparatus for the model wind rotors

GOLOMB Compression Technique For FPGA Configuration

AN EVALUATION OF ENCAPSULATED OIL FLOODED SCREW COMPRESSOR SYSTEMS. Nikola Stosic, Ahmed Kovacevic, ian K Smith and Elvedin Mujic

Evaluation of Work Zone Strategies at Signalized Intersections

Analysis of Run-Off-Road Crashes in Relation to Roadway Features and Driver Behavior

AIS data analysis for vessel behavior during strong currents and during encounters in the Botlek area in the Port of Rotterdam

Modelling of Extreme Waves Related to Stability Research

FIG: 27.1 Tool String

Analysis of Pressure Rise During Internal Arc Faults in Switchgear

REASONS FOR THE DEVELOPMENT

Open Research Online The Open University s repository of research publications and other research outputs

Evaluation and further development of car following models in microscopic traffic simulation

Reduction of Bitstream Transfer Time in FPGA

Visual Traffic Jam Analysis Based on Trajectory Data

Modeling of Hydraulic Hose Paths

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS

Service & Support. Questions and Answers about the Proof Test Interval. Proof Test According to IEC FAQ August Answers for industry.

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

Evaluation of the Relationship between Natural Ventilation and the Grouping of Five Year Old Children in a Kindergarten Classroom of Medellin

Collision Avoidance System using Common Maritime Information Environment.

Control-flow Pattern Based Transformation from UML Activity Diagram to YAWL

Fisher FIELDVUE DVC6200f Digital Valve Controller PST Calibration and Testing using ValveLink Software

D-Case Modeling Guide for Target System

Miscalculations on the estimation of annual energy output (AEO) of wind farm projects

Pedestrian traffic flow operations on a platform: observations and comparison with simulation tool SimPed

Data Sheet T 8389 EN. Series 3730 and 3731 Types , , , and. EXPERTplus Valve Diagnostic

Motion Control of a Bipedal Walking Robot

Supplementary Material for Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

Road Data Input System using Digital Map in Roadtraffic

Application of Bayesian Networks to Shopping Assistance

INCLINOMETER DEVICE FOR SHIP STABILITY EVALUATION

Fuel Economy Fiscal Measures - Feebate Tool Training

Simulating Street-Running LRT Terminus Station Options in Dense Urban Environments Shaumik Pal, Rajat Parashar and Michael Meyer

A PHASE-AMPLITUDE ITERATION SCHEME FOR THE OPTIMIZATION OF DETERMINISTIC WAVE SEQUENCES

Quantifying the Bullwhip Effect of Multi-echelon System with Stochastic Dependent Lead Time

MEETPLANNER DESIGN DOCUMENT IDENTIFICATION OVERVIEW. Project Name: MeetPlanner. Project Manager: Peter Grabowski

City of Vestavia Hills Traffic Calming Policy for Residential Streets

Safety Analysis Methodology in Marine Salvage System Design

INFLUENTIAL EVALUATION OF DATA SAMPLING TECHNIQUES ON ACCURACY OF MOTORWAY CRASH RISK ASSESSMENT MODELS

E. Agu, M. Kasperski Ruhr-University Bochum Department of Civil and Environmental Engineering Sciences

User s Guide for inext Online: Software for Interpolation and

Special edition paper

TESLAGON. ShotHelper Manual. How to install and use the Program. Version /30/2014

Rearrangement of Recognized Strokes in Online Handwritten Gurmukhi. words recognition.

MIL-STD-883G METHOD

Heart Rate Prediction Based on Cycling Cadence Using Feedforward Neural Network

AGA Swiss McMahon Pairing Protocol Standards

Bicycling Simulator Calibration: Proposed Framework

Breaking Up is Hard to Do: An Investigation of Decomposition for Assume-Guarantee Reasoning

Rescue Rover. Robotics Unit Lesson 1. Overview

Combination Analysis Tutorial

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Staking plans in sports betting under unknown true probabilities of the event

The effect of close proximity, low height barriers on railway noise

PSM I PROFESSIONAL SCRUM MASTER

Cricket umpire assistance and ball tracking system using a single smartphone camera

Laboratory Hardware. Custom Gas Chromatography Solutions WASSON - ECE INSTRUMENTATION. Engineered Solutions, Guaranteed Results.

Pressure Sensor Bridge Configurations

7 th International Conference on Wind Turbine Noise Rotterdam 2 nd to 5 th May 2017

Parsimonious Linear Fingerprinting for Time Series

Decompression Method For Massive Compressed Files In Mobile Rich Media Applications

A STUDY ON GAP-ACCEPTANCE OF UNSIGNALIZED INTERSECTION UNDER MIXED TRAFFIC CONDITIONS

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

Characterizing Ireland s wave energy resource

On the Impact of Garbage Collection on flash-based SSD endurance

Operating Manual. SUPREMA Calibration. Software for Fire and Gas Warning Units. Order No.: /01. MSAsafety.com

Fig 2.17: Topographic model: snow-free area (Snow boundary = red) Contour interval of 1 m.

Simulation Analysis of Intersection Treatments for Cycle Tracks

CALIBRATION OF THE PLATOON DISPERSION MODEL BY CONSIDERING THE IMPACT OF THE PERCENTAGE OF BUSES AT SIGNALIZED INTERSECTIONS

DETERMINATION OF OPERATING PARAMETERS IN MILKING ROBOTS WITH FREE COW TRAFFIC

At each type of conflict location, the risk is affected by certain parameters:

Assessment Summary Report Gulf of Mexico Red Snapper SEDAR 7

APPLICATION NOTES DELOMATIC 4, DM-4 GAS

SAFE CAPACITY TEST: INNOVATION AND SAVINGS

Uncovering Anomalous Usage of Medical Records via Social Network Analysis

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

This file is part of the following reference:

Transcription:

Replay using Recomposition: Alignment-Based Conformance Checking in the Large Wai Lam Jonathan Lee 2, H.M.W. Verbeek 1, Jorge Munoz-Gama 2, Wil M.P. van der Aalst 1, and Marcos Sepúlveda 2 1 Architecture of Information Systems Group, Department of Mathematics and Computer Science, Eindhoven University of Technology, (The Netherlands) {h.m.w.verbeek,w.m.p.v.d.aalst}@tue.nl 2 Department of Computer Science, School of Engineering Pontificia Universidad Católica de Chile, (Chile) {walee,jmun}@uc.cl - {marcos}@ing.puc.cl Abstract. In the area of process mining, efficient alignment-based conformance checking is a hot topic. Existing approaches for conformance checking are typically monolithic and compute exact fitness values. One limitation with monolithic approaches is that it may take a significant amount of computation time in large processes. Alternatively, decomposition approaches run much faster but do not always compute an exact fitness value. This paper presents the tool Replay using Recomposition which returns the exact fitness value and the resulting alignments using the decomposition approach in an iterative manner. Other than computing the exact fitness value, users can configure the balance between result accuracy and computation time to get a fitness interval within set constraints, e.g., Give me the best fitness estimation you can find within 5 minutes. Keywords: process mining, conformance checking, business process management 1 Recomposing Conformance for Large Processes In conformance checking of process mining, alignment-based approaches have become the state-of-the-art technique due to their robustness and detailed view on deviations [1]. Unlike previous approaches, alignment-based conformance checking provides an optimal analysis and can pinpoint deviations between the observed behavior and the modeled behavior at the level of events, e.g., the identification of an executed activity in the process that is not permitted by the process model. However, with the growth in the volume of data and the size of event logs, there is a need to make alignment-based conformance checking more scalable. Case studies have shown that alignment-based conformance checking can take excessive time or even be unfeasible in large and complex processes. Furthermore, no result is provided if the technique is not completely feasible for the entire dataset.

2 J.Lee, H.Verbeek, J.Munoz-Gama, W.van der Aalst, and M.Sepúlveda One of the most promising research lines is to decompose the alignment problem. Rather than aligning the overall process and the overall event log, the two are decomposed into subprocesses and sublogs, and alignment is performed on these smaller subcomponents. Experimental results based on large scale applications of decomposition techniques have shown significant improvements in performance, especially in computation time [4, 6]. A recent work has presented a new technique, Recomposing Conformance [2], which not only applies decomposition to the alignment problem, but merges results from subcomponents as an overall result. Moreover, this merged result is guaranteed to correspond to the exact overall result that would have been yielded under the overall approach without decomposition. As such, there are three main contributions. First, the application of decomposition techniques can lead to significant reduction in computation time. Second, approximate interval results can be computed even if conformance checking is not completely feasible for the whole dataset. Third, users can configure different criteria such as the overall time threshold to adjust to the desired balance between accuracy and computation time. In this paper we introduce Replay using Recomposition, an implementation of the Recomposing Conformance technique. 2 Recomposing Conformance: The Tool Fig. 1. Overview of RecomposingConformance framework Replay using Recomposition has been implemented as a plugin in the ProM6.7 framework. Figure 1 provides a high level overview of the conformance checking framework. It takes in a process model (an Accepting Petrinet, i.e., a Petri net with initial and final markings) and an event log as input and returns a set of alignments and fitness scores as output. Recomposing Conformance is an iterative framework composed of three phases. First, a decomposition of the model and log is generated either by an initial decomposition or through a modification of the decomposition from the previous iteration. Second, subcomponents of the decomposition are aligned separately to achieve performance gains over the monolithic approach. Third, alignment results from subcomponents are merged if possible. Alignment for the subset of the log that could not merged in the current iteration is continued in the following iteration. We refer users to [2] for

Replay using Recomposition 3 further details on the formal guarantees, merging conditions, and other details of the technique. As previously mentioned, the Recomposing Conformance framework is highly configurable. Users can configure three aspects to balance the result accuracy and the required computation time. First, users can configure the initial decomposition of the model and log which is required to start the conformance checking process. Second, users can opt between two decomposed replay methods: normal decomposition or Hide&Reduce [5]. Third, the desired level of accuracy and computation time can be adjusted through trace rejection and termination conditions. For trace rejection, users can set the computation time and maximum conflict thresholds for individual traces so that traces that take a long time to align due to conformance issues are rejected to reduce computation time. For termination, users can set the overall computation time and maximum iteration thresholds, the target alignment percentage of the log and the target range for the approximate interval result (level of accuracy) so that the checking process can be terminated once the set time or accuracy is met. As previously mentioned, an approximate interval result is computed if conformance checking is not feasible for the whole dataset under the set constraints. Figure 2 shows a screenshot of a dialog in the plugin where users can configure the mentioned parameters and the resulting alignment that can be used for deviation analysis. Fig. 2. Configuration dialog in plugin (left) and resulting alignments (right) 3 A running example To showcase the tool, we take a running example and show what the tool does with it. Given the fact that the demo needs to be interactive, we can only show a small example here. Of course, for such a small example, the non-decomposed replay will also finish well within time, but if we take a larger example for which this replay does not finish within time, the demo will not be interactive any more. For these larger examples that show that the recomposing replay can be much faster than the non-decomposing replay, we refer the interested reader to [2]. For the small running example, we take the a12f0n05 event log from [3]. This log contains 12 different activities, and 35 different traces which contain

4 J.Lee, H.Verbeek, J.Munoz-Gama, W.van der Aalst, and M.Sepúlveda Fig. 3. Event log for the running example Fig. 4. Petri net for the running example some (5%) noise. Figure 3 shows a graphical overview of this event log, showing 35 different traces of which only the 7 different traces in the left-most column occur more than once in the log. Figure 4 shows the corresponding Petri net used to replay this event log. In the first iteration, the recomposing replay replays all 35 different traces on 12 clusters. For 14 out of these 35 different traces, the first iteration already provides us with a valid alignment. For the remaining 21 different traces, conflicts were found and another iteration is required. For the second iteration, the activity f is selected to be joined on. As a result, the clusters containing f will be merged, yielding only 10 clusters. Only 12 out of the 21 remaining different traces had a conflict on f, so only these 12 are replayed on these 10 clusters. From these 12, 5 provide us with a valid alignment, and 7 do not. Etcetera. Table 1 shows the complete results for the running example. This shows that only a single different trace required a replay on the entire (non-decomposed) net, and that all other 34 different traces could be replayed successfully on some decomposition. 4 Download, Screencast, and Examples The tool is available as the Replay using Recomposition plugin in the DecomposedReplayer package of ProM6.7 (www.promtools.org).the plugin is available as a silent version (default configurations) and a visually configurable version. The source code is also available in https://svn.win.tue.nl/trac/prom/browser/

Table 1. Results on the running example. Replay using Recomposition 5 Iteration Join on #Cluster #Replay #Accept #Reject #Conflict 1 12 35 14 0 21 2 f 10 12 19 0 16 3 c 9 6 20 0 15 4 e 8 4 21 0 14 5 S 7 3 23 0 12 6 j 6 5 26 0 9 7 b 5 3 29 0 6 8 E 4 2 31 0 4 9 d 4 2 33 0 2 10 g 3 1 34 0 1 11 k 1 1 35 0 0 Packages/DecomposedReplayer. A screencast showing the features of the tool and example datasets are also available at www.processmininguc.com/tools. 5 Conclusions This paper presented Replay using Recomposition, an implementation of the Recomposing Conformance technique. This alignment-based conformance checking technique applies decomposition to compute an overall result that corresponds to the exact result that would have been yielded under the monolithic approach. Furthermore, users can adjust configurations to balance between result accuracy and computation time. Unlike existing alignment-based conformance checking techniques, an approximate result is given if conformance checking is not feasible for the whole dataset under the set constraints. References 1. Adriansyah, A.: Aligning Observed and Modeled Behavior. Ph.D. thesis, Technische Universiteit Eindhoven (2014) 2. Lee, W.L.J., Verbeek, H., Munoz-Gama, J., van der Aalst, W.M.P., Sepúlveda, M.: Recomposing Conformance: Closing the Circle on Decomposed Alignment- Based Conformance Checking in Process Mining. (under review) (2017), processmininguc.com/publications 3. Maruster, L., Weijters, A.J.M.M., Aalst, W.M.P.v.d., Bosch, A.v.d.: A rule-based approach for process discovery: Dealing with noise and imbalance in process logs. Data Min. Knowl. Disc. 13(1), 67 87 (2006) 4. Munoz-Gama, J.: Conformance Checking and Diagnosis in Process Mining - Comparing Observed and Modeled Processes. Springer (2016) 5. Verbeek, H.M.W.: Decomposed replay using hiding and reduction. In: PNSE 2016 Workshop Proceedings. pp. 233 252 (2016) 6. Verbeek, H.M.W., Munoz-Gama, J., Aalst, W.M.P.v.d.: Divide And Conquer: A Tool Framework for Supporting Decomposed Discovery in Process Mining. The Computer Journal (2017)