59
How People Really (Like To) Work prof.dr.ir. Wil van der Aalst 5th International Conference on Human-Centered Software Engineering (HCSE 2014), Paderborn, 16-9-2014. Comparative Process Mining to Unravel Human Behavior

How People Really (Like To) Work - uni-paderborn.de · 2014-09-24 · How People Really (Like To) Work prof.dr.ir. Wil van der Aalst 5th International Conference on Human-Centered

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

How People Really (Like To) Work

prof.dr.ir. Wil van der Aalst

5th International Conference on Human-Centered Software

Engineering (HCSE 2014), Paderborn, 16-9-2014.

Comparative Process Mining

to Unravel Human Behavior

process mining

Process mining: The missing link

PAGE 3

process mining

data-oriented analysis(data mining, machine learning, business intelligence)

process model analysis(simulation, verification, optimization, gaming, etc.)

performance-oriented

questions, problems and

solutions

compliance-oriented

questions, problems and

solutions

Positioning Process Mining

PAGE 4

process

discoveryprocess

mining

conformance

checking predictive

analytics

data

mining /

machine

learning

BPM

BI?

BPI?

PAGE 5

PAGE 6

Starting point for process mining:

Event data

student name course name exam date mark

Peter Jones Business Information systems 16-1-2014 8

Sandy Scott Business Information systems 16-1-2014 5

Bridget White Business Information systems 16-1-2014 9

John Anderson Business Information systems 16-1-2014 8

Sandy Scott BPM Systems 17-1-2014 7

Bridget White BPM Systems 17-1-2014 8

Sandy Scott Process Mining 20-1-2014 5

Bridget White Process Mining 20-1-2014 9

John Anderson Process Mining 20-1-2014 8

… … … …

PAGE 7

case id activity

name timestamp other data

every row is an event

(here: an exam attempt)

Another event log: order handling

order

number

activity timestamp user product quantity

9901 register order [email protected] Sara Jones iPhone5S 1

9902 register order [email protected] Sara Jones iPhone5S 2

9903 register order [email protected] Sara Jones iPhone4S 1

9901 check stock [email protected] Pete Scott iPhone5S 1

9901 ship order [email protected] Sue Fox iPhone5S 1

9903 check stock [email protected] Pete Scott iPhone4S 1

9901 handle payment [email protected] Carol Hope iPhone5S 1

9902 check stock [email protected] Pete Scott iPhone5S 2

9902 cancel order [email protected] Carol Hope iPhone5S 2

… … … … … …

PAGE 8 case id

activity

name timestamp other data resource

Another event log: patient treatment

patient activity timestamp doctor age cost

5781 make X-ray [email protected] Dr. Jones 45 70.00

5541 blood test [email protected] Dr. Scott 61 40.00

5833 blood test [email protected] Dr. Scott 24 40.00

5781 blood test [email protected] Dr. Scott 45 40.00

5781 CT scan [email protected] Dr. Fox 45 1200.00

5833 surgery [email protected] Dr. Scott 24 2300.00

5781 handle payment [email protected] Carol Hope 45 0.00

5541 radiation therapy [email protected] Dr. Jones 61 140.00

5541 radiation therapy [email protected] Dr. Jones 61 140.00

… … … … … …

PAGE 9

case id activity

name timestamp other data

resource

PAGE 10

www.olifantenpaadjes.nl

11

demo

PAGE 13

600+ plug-ins available covering the

whole process mining spectrum

PAGE 14

Quiz Question:

How to remove behavior?

PAGE 40

A

B

C

DE

p2

end

p4

p3p1

start

ABCD AED

ACBD

ABCD ABCD

ABCD AED ACBD

Answer:

Add places or remove transitions

PAGE 41

A

B

C

DE

p2

end

p4

p3p1

start

ABCD AED

ACBD

ABCD ABCD

ABCD AED ACBD

Quiz Question:

How to add behavior?

PAGE 42

A

B

C

DE

p2

end

p4

p3p1

start

Answer:

Add transitions or remove places!

PAGE 43

A

B

C

DE

p2

end

p4

p3p1

start

FABCD AED

AFBD

ABCD ABCD

BACD ABFD ACBD

BAFD

Process discovery algorithms (small selection)

PAGE 44

α algorithm

α++ algorithm

α# algorithm

language-based regions

state-based regions genetic mining

heuristic mining

hidden Markov models

neural networks

automata-based learning

stochastic task graphs

conformal process graph

mining block structures

multi-phase mining partial-order based mining

fuzzy mining

LTL mining

ILP mining

distributed genetic mining

ETM genetic algorithm Inductive Miner (infrequent)

Language based regions

PAGE 45

a1

a2

b1

b2

d

pR

e

c1

c

f

YX

Region R = (X,Y,c) corresponding to place pR: X = {a1,a2,c1} =

transitions producing a token for pR, Y = {b1,b2,c1} = transitions

consuming a token from pR, and c is the initial marking of pR.

Basic idea: enough tokens should be

present when consuming

PAGE 46

a1

a2

b1

b2

d

pR

e

c1

c

f

YX

A place is feasible if it

can be added without

disabling any of the

traces in the event log.

Process mining is about connecting things

• Data – Process

• Business – IT

• Business Intelligence – Business Process Management

• Performance – Compliance

• Runtime – Design time

• …

PAGE 47

Processes are not just about control-

flow!

PAGE 48

control-flow

perspective

resource

perspective

data-flow

perspective time

perspective cost

perspective

energy

perspective

Process mining spectrum

• Online and offline (near

realtime).

• All perspectives.

• De facto models are

descriptive/predictive.

• De jure models are

normative/prescriptive.

• Process discovery is just

one element: Aligning

model and reality is the

key thing.

PAGE 49

information system(s)

current

data

“world”people

machines

organizationsbusiness

processes documents

historic

data

resources/

organization

data/rules

control-flow

de jure models

resources/

organization

data/rules

control-flow

de facto models

provenance

event logs

models

“pre

mortem”

“post

mortem”

check

conformancediscover

enhance/

repair

performance

analysis

monitor

compliance

predict perf.,

risks, costs, ...

recommend

based on goal

process cubes

Process discovery is like applying

a function

PAGE 51

4.24

3.95

4.89

4.56 compute average

e1

e2

e3

create model

(freq., sum, variance, mode, …)

(alpha, heuristic, fuzzy, social,

dotted, compliance, etc., etc.)

2 dimensions: model = number

PAGE 52

hours: 50.5

number: 32 passed

failed

TU/e HSE QUT

hours: 48.5

number: 55

hours: 23.2

number: 49

hours: 23.5

number: 23

hours: 10.5

number: 4

hours: 24.5

number: 8

Process models are computed on two

dimensional event data

PAGE 53

passed

failed

TU/e HSE QUT

What are the differences?

PAGE 54

Example: Hertz has 8,650 rental locations

and different types of customers

PAGE 55

gold

silver

normal

Example: All Dutch municipalities need

to handle building permits

PAGE 56

>100k

50k

>50k & 100k

10 Dutch

municipalities

Example: Suncorp has different brands

and different types of insurance

PAGE 57

Example: students watching recorded

video lectures and making exams

PAGE 58

{1,2,3,4}

{5,6,7}

{8,9,10}

mark

Process Cubes (OLAP for processes)

PAGE 59

Cour

se_I

nsta

nce

Student.nationality

Stu

den

t.ge

nd

er

process cell

event

New behavior

process cube

cell sublog

case

s

time

male

German

Jan-Aug-2011

dimension

mal

efe

mal

e

Dutch Chinese German

process mining results per cell

Overview data for a particular course

PAGE 60

every horizontal line corresponds to a

student (287 students)

a red dots refers to an exam attempt

course instance running from January

2011 until August 2011

a non-red dots refers to a student viewing a particular lecture (each

lecture has a unique color)

time runs from left to right (period of

approximately 3 years)each dot refers to an event (6744 events in total)

Drilling-down into a course instance

PAGE 61

Dutch studentsDutch students International studentsInternational students

Stu

den

ts t

hat

pas

sed

Stu

den

ts t

hat

pas

sed

Stu

den

ts t

hat

did

no

t p

ass

Stu

den

ts t

hat

did

no

t p

ass

Comparing processes

PAGE 62

Fitness of event

log wrt idealized

model is 0.37

Fitness of event

log wrt idealized

model is 0.28

Overview of approach

PAGE 63

event data

process cube

event logs (per cell)

1

2

3

discovered models (per cell)

normative models

5

4

6

7

1. Store events in the

process cube.

2. Materialize the events

in a cell as an event log

that can be analyzed.

3. Automatically discover

models per cell (e.g., a

BPMN or UML model).

4. Check conformance by

replaying event data on

normative (process)

models.

5. Compare discovered

models and normative

models.

6. Compare discovered

models corresponding

to different cells.

7. Compare different

behaviors by replaying

event data of one cell

on another cell's model.

splitting event logs

PAGE 65

data Big data

Star Trek

What if?

PAGE 66

book car

c

add extra

insurance

dchange

booking

e

confirm initiate

check-in

j

check driver’s

license

k

charge credit

card

i

select car

g

supply

car

in

a

b

skip extra

insurance

f

h

add extra

insurance

skip extra

insurance

l

out

c1 c2

c3

c4

c5

c6

c7

c8

c9

c10

c11

acefgijkl

acddefhkjil

abdefjkgil

acdddefkhijl

acefgijkl

abefgjikl

...

processdiscovery

conformance checking

there are more than

1000 different

activities?

there are more

than 1.000.000

cases?

there are more

than 100.000.000

events?

Decompose event log! vertical or horizontal

PAGE 67

sets of

cases

sets of

activities

Vertical distribution I:

Split cases arbitrarily

PAGE 68

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

sets of

cases

Vertical distribution II:

Split cases based on a specific feature

PAGE 69

Horizontal distribution

PAGE 70

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

abegabefbeg

abegabefbegabefbegabefbeg

abegabeg

abefbefbegabeg

abefbefbegabefbeg

abegabeg

abefbegabeg

bcdebdcebcde

bdcebcdebcdebdcebdcebcdebdce

bcdebdce

bdcebdcebdcebcde

bcdebcdebdcebcdebdce

bcdebdce

bdcebcdebcde

sets of

activities

Horizontal distribution: The key idea

PAGE 71

abegabefbeg

abegabefbegabefbegabefbeg

abegabeg

abefbefbegabeg

abefbefbegabefbeg

abegabeg

abefbegabeg

bcdebdcebcde

bdcebcdebcdebdcebdcebcdebdce

bcdebdce

bdcebdcebdcebcde

bcdebcdebdcebcdebdce

bcdebdce

bdcebcdebcde

a b e

c1in

g

outc6

f

b

c

d

ec2

c3

c4

c5

projected on

a,b,e,f,g

projected on

b,c,d,e

Two foundational ways of spitting event

data: horizontal or vertical

PAGE 72

A

start complete

G

B

C

D

E

F

1: ACDEG2: BCFG3: BCFG

4: ACEDG5: ACFG

6: ACEDG7: BCEDG8: BCDEG

1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC

A

G

B

C

D

E

FC

1: CDEG2: CFG3: CFG

4: CEDG5: CFG

6: CEDG7: CEDG8: CDEG

split

horizontally

A C F G

B C F G

B C

D

G

E

A C

D

G

E

1: ACDEG4: ACEDG6: ACEDG

7: BCEDG8: BCDEG

5: ACFG

2: BCFG3: BCFG

merge

horizontally

split

vertically

merge

vertically

PAGE 73

decomposed

process mining

Decomposing Conformance Checking

PAGE 74

SNprocessmodel

Levent log

decomposition technique

conformancechecking

technique

M1

submodel

L1sublog

conformance check

M2

submodel

L2sublog

conformance check

Mn

submodel

Lnsublog

conformance check

conformance diagnostics

decompose model

decompose event log

e.g., maximal decomposition, passage-based decomposition, or SESE/RPST-based decomposition

e.g., A* based alignments, token-based replay, or simple replay until first deviation

yields a (valid) activity partitioning

See "divide and conquer"

framework by Eric Verbeek.

Example of a valid decomposition

PAGE 75

a

start t1

SN1

a

c

e

c2

t1

t4

t6SN3

ab

d

e

c1 c3

t1

t2

t3

t5

t6

SN2

d

g

h

e

c5

f

t5

t6

t7 t8

t9

t10

c6

c7

SN5

c

d

c4

t4

t5

SN4

g

h

end

f

t8

t9

t10

t11c8

c9

SN6

a

start

b

c

d

g

h

e

end

c1

c2

c3

c4

c5t1

f

t2

t3

t4

t5

t6

t7 t8

t9

t10

t11

c6

c7 c8

c9

Log can be split in the same way!

Example of an alignment for observed

trace a,b,c,d,e,c,d,g,f

PAGE 76

a

start

b

c

d

g

h

e

end

c1

c2

c3

c4

c5t1

f

t2

t3

t4

t5

t6

t7 t8

t9

t10

t11

c6

c7 c8

c9

a

start t1

SN1

a

c

e

c2

t1

t4

t6SN3

ab

d

e

c1 c3

t1

t2

t3

t5

t6

SN2

d

g

h

e

c5

f

t5

t6

t7 t8

t9

t10

c6

c7

SN5

c

d

c4

t4

t5

SN4

g

h

end

f

t8

t9

t10

t11c8

c9

SN6

Etc.

a,b,c,d,e,c,d,g,f

Conformance checking can be

decomposed !!!

• General result for any valid decomposition: Any

event log or trace is perfectly fitting the overall

model if and only if it is also fitting all the individual

fragments

PAGE 77

a

start

b

c

d

g

h

e

end

c1

c2

c3

c4

c5t1

f

t2

t3

t4

t5

t6

t7 t8

t9

t10

t11

c6

c7 c8

c9

a

start t1

SN1

a

c

e

c2

t1

t4

t6SN3

ab

d

e

c1 c3

t1

t2

t3

t5

t6

SN2

d

g

h

e

c5

f

t5

t6

t7 t8

t9

t10

c6

c7

SN5

c

d

c4

t4

t5

SN4

g

h

end

f

t8

t9

t10

t11c8

c9

SN6

Wil van der Aalst, Decomposing Petri nets for process mining:

A generic approach. Distributed and Parallel Databases,

Volume 31, Issue 4, pp 471-507, 2013

Example (work with Jorge Munoz-Gama and Josep Carmona)

PAGE 78

prFm6

Decomposing Process Discovery

PAGE 79

Mprocessmodel

Levent log

decomposition technique

process discoverytechnique

M1

submodel

L1sublog

discovery

M2

submodel

L2sublog

Mn

submodel

Lnsublog

compose model

decompose event log

discovery discovery

e.g., causal graph based on frequencies is decomposed using passages or SESE/RPST

e.g., language/state-based region discovery, variants of alpha algorithm, genetic process mining

yields a (valid) activity partitioning

See "divide and conquer"

framework by Eric Verbeek.

conclusion

PAGE 81

Event data is everywhere!

Process are everywhere!

Why not connect them?

Process mining!

Challenges:

• comparative process mining

• Big event data, Big processes

Process Mining: Data Science in Action https://www.coursera.org/course/procmin

data mining

processmining

visualization

data science

behavioral/social

sciences

domainknowledge

machine learning

large scale distributedcomputing

statistics industrialengineering

databases

business process

intelligence

stochastics

privacy

algorithms

visual analytics

First Massive Open

Online Course (MOOC)

on Process Mining

PAGE 83

www.processmining.org

www.win.tue.nl/ieeetfpm/