Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.
For the best experience please use the latest Chrome, Safari or Firefox browser.
Discovery Gap
- Reality vs. idealized process models
- Participants provide only incomplete and inconsistent knowledge
- Focus on happy path and planned exceptions
- Both "as-is" and "to-be" models are disconnected from what actually happens in the organization
Process Mining Steps
Logging
- Time: When did an event happen?
- Resource: Who was involved in the event?
- Activity: What was the event about?
- Case: Which process instance?
Capture enough information that can be trusted to be fed into the mining algorithm
Log Granularity
- Coarse-grained: one event per activity (instantaneous tasks)
- Fine-grained: one event for each state transition of each activity (ready, waiting, started, completed)
Where to find events?
- As they happen:
- Intercept and record every action and interaction
- Force people to report task completions
- After the fact:
- Reconstruct events from historical databases
- Look for timestamps in schemas and map them to an event
Extract the process model by mining the logs
Monitor the process execution
Mining Outcome
- Process Model:
control flow with partial ordering of events
- Social Network:
based on the frequency of handovers
- Decision rules:
branch probability based on data/known state
- Performance:
activity and process duration statistics, resource utilization
Process Mining Algorithm
Assumptions
- Every event belongs to one process instance
(how to find the corresponding instance?)
- Every logged event carries a timestamp
(the time of the event or the time when the event was logged?)
- The log contains only completed processes
(how to filter incomplete ones?)
- The log contains all possible events and event pairs (behavioral completeness)
Basic Mining Algorithm
- Abstract event log: identify event sequences
- Determine partial order of event pairs
- Classify causal, parallel and non-succession relationships
- Build the footprint matrix
- Reconstruct the control flow
Order Relationship
a>b
task a is directly followed by b
Causality Relationship
a→b := a>b ∧ !(b>a)
task a is directly followed by task b,
and task a never directly follows task b
Parallelism Relationship
a∥b := a>b ∧ b>a
task a both directly follows and directly precedes task b
No-Direct-Succession Relationship
a#b := !(a>b) ∧ !(b>a)
neither task a directly follows b, nor task b directly follows task a
Footprint Matrix
Complete classification of all possible event pairs according to the causality, parallelism and no-direct-succession relationships
Input for the control flow reconstruction algorithm
Challenges
How to deal with:
- Incomplete input event log (only positive examples)
- Unclear separation between regular and exceptional behavior
- Large amounts of noisy data
- Underfitting, fitting or overfitting?
Wil van der Aalst
BPM Lifecycle
References
- Marlon Dumas, Marcello La Rosa, Jan Mendling, Hajo Reijers, Fundamentals of
Business Process Management, Chapter 10, Springer, 2013, ISBN 978-3-642-33142-8
- Wil M. P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, Springer 2011
- Jan Dirk Van-der-Burg, Desire Lines/Olifantenpaadjes