Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Event Logs What kind of data does process mining require?
prof.dr.ir. Wil van der Aalst www.processmining.org
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Getting the "right" data …
software system
(process)model
eventlogs
modelsanalyzes
discovery
records events, e.g., messages,
transactions, etc.
specifies configures implements
analyzes
supports/controls
enhancement
conformance
“world”
people machines
organizationscomponents
businessprocesses
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Event log
• We assume the existence of an event log where each event refers to a case, an activity, and a point in time.
• An event log can be seen as a collection of cases.
• A case can be seen as a trace/sequence of events.
event data can be found everywhere!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Event data may come from …
• a database system (e.g., patient data in a hospital), • a comma-separated values (CSV) file or spreadsheet, • a transaction log (e.g., a trading system), • a business suite/ERP system (SAP, Oracle, etc.), • a message log (e.g., from IBM middleware), • an open API providing data from websites or social
media, …
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
An example log
student name course name exam date mark Peter Jones Business Information systems 16-1-2014 8 Sandy Scott Business Information systems 16-1-2014 5
Bridget White Business Information systems 16-1-2014 9 John Anderson Business Information systems 16-1-2014 8
Sandy Scott BPM Systems 17-1-2014 7 Bridget White BPM Systems 17-1-2014 8 Sandy Scott Process Mining 20-1-2014 5
Bridget White Process Mining 20-1-2014 9 John Anderson Process Mining 20-1-2014 8
… … … …
case id activity name timestamp other data
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Another event log: order handling order
number activity timestamp user product quantity
9901 register order [email protected] Sara Jones iPhone5S 1 9902 register order [email protected] Sara Jones iPhone5S 2 9903 register order [email protected] Sara Jones iPhone4S 1 9901 check stock [email protected] Pete Scott iPhone5S 1 9901 ship order [email protected] Sue Fox iPhone5S 1 9903 check stock [email protected] Pete Scott iPhone4S 1 9901 handle payment [email protected] Carol Hope iPhone5S 1 9902 check stock [email protected] Pete Scott iPhone5S 2 9902 cancel order [email protected] Carol Hope iPhone5S 2
… … … … … …
case id activity name timestamp other data resource
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Another event log: patient treatment patient activity timestamp doctor age cost 5781 make X-ray [email protected] Dr. Jones 45 70.00 5541 blood test [email protected] Dr. Scott 61 40.00 5833 blood test [email protected] Dr. Scott 24 40.00 5781 blood test [email protected] Dr. Scott 45 40.00 5781 CT scan [email protected] Dr. Fox 45 1200.00 5833 surgery [email protected] Dr. Scott 24 2300.00 5781 handle payment [email protected] Carol Hope 45 0.00 5541 radiation therapy [email protected] Dr. Jones 61 140.00 5541 radiation therapy [email protected] Dr. Jones 61 140.00
… … … … … …
case id activity name timestamp other data resource
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Minimal requirements in terms of a CSV file
• Each row corresponds to an event. • There are at least three columns:
• Case id (patient id, order number, claim number, …) • Activity name (approve, reject, request, send, …) • Timestamp (2015-08-18T06:36:40, …)
• There may be many other (optional) columns: resource, transaction type, age, costs, etc.
order number activity timestamp user product quantity
9901 register order [email protected] Sara Jones iPhone5S 1 9902 register order [email protected] Sara Jones iPhone5S 2
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Extensions • Transactional information on activity instances:
An event can represent a start, complete, suspend, resume, abort, etc.
• Case versus event attributes: − case attributes do not change, e.g., the birth date or
gender of a patient, − event attributes are related to a particular step in the
process.
schedule assign reassignsuspend
resumestart
autoskipmanualskip complete
abort_case
abort_activity
withdraw
successful termination
unsuccessful termination
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
XES (eXtensible Event Stream)
• Adopted by the IEEE Task Force on Process Mining. • The format is supported by tools such as ProM and
Disco (used in this course). • Predecessors: MXML and SA-MXML. • Conversion from other formats (CSV) is easy if the
right data are available. • XML syntax and OpenXES library available. • See www.xes-standard.org.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Simplistic view on event data order
number activity timestamp user product quantity
9901 register order [email protected] Sara Jones iPhone5S 1 9902 register order [email protected] Sara Jones iPhone5S 2 9903 register order [email protected] Sara Jones iPhone4S 1 9901 check stock [email protected] Pete Scott iPhone5S 1 9901 ship order [email protected] Sue Fox iPhone5S 1 9903 check stock [email protected] Pete Scott iPhone4S 1 9901 handle payment [email protected] Carol Hope iPhone5S 1 9902 check stock [email protected] Pete Scott iPhone5S 2 9902 cancel order [email protected] Carol Hope iPhone5S 2
… … … … … …
case id activity name timestamp other data resource
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
A more refined view
...
process case
activity activity instance
event
attribute
timestamp
resource*
1
*1
*
1
*1
1
**1
1
*a
b
c
d
e
f g
h
i
event level
costs
trans- action
model level
instance level
j
k
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Transactional model for activities
schedule assign reassignsuspend
resumestart
autoskipmanualskip complete
abort_case
abort_activity
withdraw
successful termination
unsuccessful termination
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Atomic activities
schedule assign reassignsuspend
resumestart
autoskipmanualskip complete
abort_case
abort_activity
withdraw
successful termination
unsuccessful termination
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Activities that have a duration
schedule assign reassignsuspend
resume
start
autoskipmanualskip complete
abort_case
abort_activity
withdraw
successful termination
unsuccessful termination
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Converting Event Data to XES • Easy with tools like ProM and Disco!
Import CSV file, then apply “Convert CSV to XES” plug-in.
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Learn more about event logs
• Extensive introduction: Chapter 4 of Process Mining: Data Science in Action by W.M.P. van der Aalst, Springer Verlag, 2016 (ISBN 978-3-662-49850-7)
• XES standard: www.xes-standard.org • More advanced: W.M.P. van der Aalst. Extracting Event
Data from Databases to Unleash Process Mining. In BPM - Driving Innovation in a Digital World, Springer-Verlag, Berlin, 2015. http://dx.doi.org/10.1007/978-3-319-14430-6_8
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Example logs files (XES)
• http://data.3tu.nl/repository/collection:event_logs_real • http://data.3tu.nl/repository/collection:event_logs_synthetic • http://www.win.tue.nl/ieeetfpm/doku.php?id=shared:process_mi
ning_logs • http://www.processmining.org/logs/start
Event data are everywhere! What is your excuse?