Image Description

Conformance Checking

In lesson 2, we described four types of process mining: process discovery, conformance checking, process enhancement, and operational support. After explaining techniques for the first type of process mining, we now focus on the second type: conformance checking. The input for conformance checking is an event log and a process model. The output consists of diagnostics describing commonalities and discrepancies between the modeled behavior and the observed behavior. The model may have been constructed by hand or may have been discovered. Automatically discovered models aim to be descriptive, and conformance checking can be used to judge the quality of the model. Hand-made models may be descriptive or normative. If the model is normative, then conformance checking is used to check compliance and the goal is not to evaluate the quality of the model, but to diagnose deviations from the normative process. For example, the event log can be replayed on top of the process model to find undesirable deviations suggesting fraud or inefficiencies.

Most conformance checking techniques try to replay the traces in the event log on the process model. A process model describes a possibly infinite collection of traces. If the observed trace in the event log is an element of the collection of traces described by the model, we say the trace is fitting. If the observed trace is not in this collection, the trace is deviating. Using conformance checking, we can split the cases in the event log in two groups: fitting cases and non-fitting cases. If a case is non-fitting, then conformance checking can be used to generate explanations. For example, the payment activity occurred before the order was approved, or the order was paid twice.

There are two mainstream approaches to perform conformance checking. The first approach is token-based replay using the Petri net representation of the process model. The idea is to play the "Petri-net token game" following the trace in the event log. If the event log indicates that activity should take place, and the corresponding Petri net transition is not possible because one of the input places is empty, then a missing token is recorded. Similarly, unused tokens are recorded. The larger the number of missing and remaining tokens, the bigger the compliance problem is. Moreover, it is possible to pinpoint the problematic parts in both the process model and the event log.

The second approach used for conformance checking is based on alignments. If a trace is non-fitting, then the alignment indicates one of the closest traces possible in the model. The alignment shows the difference between the observed trace and the best matching trace in the model. Moreover, using alignments, it is possible to relate each case in the event log to a path through the model. This is not only used for conformance checking, but also for performance analysis, decision point analysis, and prediction.

Using a cost function, one can describe the severity of deviations. The default cost function assigns cost 1 to a so-called log-only-move or a model-only-move. A log-only-move is needed when an event in the event log cannot be mimicked by the process model. A model-only-move is needed when a step needed in the model did not happen in reality.

Both the token-based-replay-based approach and the alignment-based approach provide detailed diagnostics explaining the differences between the model and reality. It is also possible to quantify the correspondence between the process model and event log in terms of a replay fitness measure. A replay fitness of 1 means that the event log is perfectly fitting. A replay fitness close to 0 means that the model and reality strongly disagree. The replay fitness can be seen as a health indicator. A lower than expected replay fitness value can be viewed from two angles: the real process deviates from the modeled process, or the modeled process does not capture reality well enough.

Because conformance checking can be used to label cases into fitting or non-fitting, it can be combined with supervised learning techniques ranging from decision trees to neural networks. This way, it is possible to explain or even predict deviations.

Conformance checking will dramatically change the work of auditors and accountants. For example, auditors need to provide reasonable assurance that business processes are executed within the given set of boundaries and that numbers correspond to reality. This is typically done manually using a small sample of cases. Using process mining, this can be done automatically for all cases. Today, detailed information about processes is being recorded in the form of event logs, audit trails, transaction logs, databases, data warehouses, etc. Therefore, it should no longer be acceptable to only check a small set of samples off-line. Instead, all events in a business process can be evaluated, and this can be done while the process is still running.

Most commercial tools focus on process discovery. However, conformance checking and other more advanced types of process mining rapidly increase in importance. In the next lesson, we discuss the remaining two types of process mining: process enhancement and operational support.

Image Description
Written by

Wil van der Aalst