34
Process Mining An index to the state of the art and an outline of open research challenges at DIIAG Claudio Di Ciccio , Massimo Mecella Seminars in Software and Services for the Information Society Rome, 2012, May the 7 th

Claudio Di Ciccio , Massimo Mecella

  • Upload
    aggie

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Process Mining An index to the state of the art and an outline of open research challenges at DIIAG. Claudio Di Ciccio , Massimo Mecella. Seminars in Software and Services for the Information Society Rome, 2012, May the 7 th. Process Mining. Definition. - PowerPoint PPT Presentation

Citation preview

Page 1: Claudio Di  Ciccio , Massimo  Mecella

Process MiningAn index to the state of the art and an outline of open research challenges at DIIAG

Claudio Di Ciccio, Massimo Mecella

Seminars in Software and Services for the Information SocietyRome, 2012, May the 7th

Page 2: Claudio Di  Ciccio , Massimo  Mecella

Process Mining

• Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs).

• ProM [AalstEtAl2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques.• The new version 6.0 is available for download at

www.processmining.org

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma) P. 2

Definition

Page 3: Claudio Di  Ciccio , Massimo  Mecella

Process Mining

• Process Mining involves:• Process discovery

• Control flow mining, organizational mining, decision mining;

• workflow

• Conformance checking

• Operational support

• We will focus on the control flow mining

• Many control flow mining algorithms proposed• α [AalstEtAl2004] and α++ [WenEtAl2007]

• Fuzzy [GüntherAalst2007]

• Heuristic [WeijtersEtAl2001]

• Genetic [MedeirosEtAl2007]

• Two-step [AalstEtAl2010]

• …

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma) P. 3

Definition

Page 4: Claudio Di  Ciccio , Massimo  Mecella

Process Mining

• The rest of the lesson is based on the following material:• Van der Aalst, W. M. P.: “Process Discovery: An

Introduction”• Available at

http://www.processmining.org/_media/processminingbook/process_mining_chapter_05_process_discovery.pdf

• From the teaching material for [Aalst2011.book]

• De Medeiros, A. K. A.: “Process Mining: Control-Flow Mining Algorithms”

• Available athttp://www.processmining.org/_media/courses/processmining/lecture3_controlflowminingalgorithms.ppt

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma) P. 4

Further reading

Page 5: Claudio Di  Ciccio , Massimo  Mecella

A different context (1)

• Artful processes [HillEtAl06]– informal processes typically

carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.)

• “knowledge workers”[ACTIVE09]

• Knowledge workers create artful processes “on the fly”

• Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared.

Artful processes and knowledge workers

P. 5Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 6: Claudio Di  Ciccio , Massimo  Mecella

A different context (2)

• In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers

– E.g., a software development mgr.

• Typically, by means of several email conversations

– Email conversations are actual traces of running processes that knowledge workers adhere to

Email conversations

P. 6Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 7: Claudio Di  Ciccio , Massimo  Mecella

A different context (3)

• From the collection of email messages, you can extract the processes that lay behind– Related e-mail conversations are traces of their runs

• Valuable advantages for users– Automated discovery of formal representations

• with no effort for knowledge workers

– Tidy organization for naïve best practices kept only in mind

– Opportunity to share and compare the knowledge on methodologies

– Automated discovery of bottlenecks, delays, structural defects• from the analysis of previous runs

• Email conversations are a kind of semi-structured text– this approach is not tailored to the electronic mail

• it can be extended to the analysis of other semi-structured texts

Processes from email conversations

P. 7Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 8: Claudio Di  Ciccio , Massimo  Mecella

A different context (4)

• Personal information management (PIM)– how to organize one’s own activities, contacts, etc. through the

usage of software

• Information warfare– in supporting anti-crime intelligence agencies

• Enterprise engineering– for knowledge-heavy industries, where preserving documents

making up product data is not enough

• eHealth– for the automatic discovery of medical treatment procedures on

top of patient health records

Some areas of applicability

P. 8Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 9: Claudio Di  Ciccio , Massimo  Mecella

MailOfMine

MailOfMine is the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of email messages, a set of workflow models that represent the artful processes laying behind the knowledge workers’ activities.

[DiCiccioEtAl11]

[DiCiccioMecella12]

[DiCiccioMecella/TR12]

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma) P. 9

What is MailOfMine?

Page 10: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processes

P. 10

The imperative model

• Represents the whole process at once

• The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets)

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 11: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processes

• Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints

• the idea is that every task can be performed, except the ones which do not respect such constraints

• this technique fits with processes that are highly flexible and subject to changes, such as artful processes

P. 11

The declarative modelIf A is performed,

B must be perfomed,no matter

before or afterwards(responded existence)

Whenever B is performed,C must be performed

afterwardsand B can not be repeated

until C is done(alternate response)

The notation here is based on [AalstEtAl06, MaggiEtAl11] (DecSerFlow, Declare)

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 12: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processes

P. 12

Imperative vS declarative

Imperative

Declarative

Declarative models work better in presence of a partial specification of the process scheme

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 13: Claudio Di  Ciccio , Massimo  Mecella

Declare constraint templates

P. 13

Existence templates

Existence(n, A)Activity A occurs at least n times in the process instanceBCAAC ✓BCAAAC ✓BCAC ✗(for n = 2)

Absence(A)Activity A does not occur in the process instanceBCC ✓BCAC ✗

Absence(n+1, A)Activity A occurs at most n+1 times in the process instanceBCAAC ✗BCAC ✓BCC ✓(for n = 2)

Exactly(n, A)Activity A occurs exactly n times in the process instanceBCAAC ✗BCAAAC ✗BCAC ✗(for n = 2)

Init(A)Activity A is the first to occur in each process instanceBCAAC ✗ACAAAC ✓ BCC ✗

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 14: Claudio Di  Ciccio , Massimo  Mecella

Declare constraint templates

P. 14

Relation templates

RespondedExistence(A, B)If A occurs in the process instance, then B occurs as wellCAC ✗CAACB ✓BCAC ✓BCC ✓

Response(A, B)If A occurs in the process instance, then B occurs after ABCAAC ✗CAACB ✓CAC ✗BCC ✓AlternateResponse(A, B)Each time A occurs in the process instance, then B occurs afterwards, before A recursBCAAC ✗CAACB ✗CACB ✓CABCA ✗BCC ✓CACBBAB ✓

ChainResponse(A, B)Each time A occurs in the process instance, then B occurs immediately afterwardsBCAAC ✗BCAABC ✗BCABABC ✓

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 15: Claudio Di  Ciccio , Massimo  Mecella

Declare constraint templates

P. 15

Relation templates

RespondedExistence(B, A)If B occurs in the process instance, then A occurs as wellCAC ✓CAACB ✓BCAC ✓BCC ✗

Precedence(A, B)B occurs in the process instance only if preceded by ABCAAC ✗CAACB ✓CAC ✓BCC ✓AlternatePrecedence(A, B)Each time B occurs in the process instance, it is preceded by A and no other B can recur in betweenBCAAC ✗CAACB ✓CACB ✓CABCA ✓BCC ✗CACBAB ✓

ChainPrecedence(A, B)Each time B occurs in the process instance, then B occurs immediately beforehandBCAAC ✗BCAABC ✗CABABCA ✓

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 16: Claudio Di  Ciccio , Massimo  Mecella

Declare constraint templates

P. 16

Relation templates

CoExistence(A, B)If B occurs in the process instance, then A occurs, and viceversaCAC ✗CAACB ✓BCAC ✓BCC ✗

Succession(A, B)A occurs if and only if it is followed by B in the process instanceBCAAC ✗CAACB ✓CAC ✗BCC ✗AlternateSuccession(A, B)A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the traceBCAAC ✗CAACB ✗CACB ✓CABCA ✗BCC ✗CACBAB ✓

ChainSuccession(A, B)A and B occur in the process instance if and only if the latter immediately follows the formerBCAAC ✗BCAABC ✗CABABC ✓

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 17: Claudio Di  Ciccio , Massimo  Mecella

Declare constraint templates

P. 17

Negative relation templates

NotCoExistence(A, B)A and B never occur together in the process instanceCAC ✓CAACB ✗BCAC ✗BCC ✓

NotSuccession(A, B)A can never occur before B in the process instanceBCAAC ✓CAACB ✗CAC ✓BCC ✓

NotChainSuccession(A, B)A and B occur in the process instance if and only if the latter does not immediately follows the formerBCAAC ✓BCAABC ✗CBACBA ✓

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 18: Claudio Di  Ciccio , Massimo  Mecella

Relation constraint templates subsumption

P. 18

Constraint templates are not independent of each other

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 19: Claudio Di  Ciccio , Massimo  Mecella

Relation constraint templates subsumption

E.g.,• A trace like ABABCABCC satisfies (w.r.t. A and B):

• RespondedExistence(A, B), RespondedExistence(B, A),CoExistence(A, B), CoExistence(B, A), Response(A, B), AlternateResponse(A, B), ChainResponse(A, B), Precedence(A, B), AlternatePrecedence(A, B), ChainPrecedence(A, B),Succession(A, B), AlternateSuccession(A, B),ChainSuccession(A, B)

• The mining algorithm would show the most strict constraint only (ChainSuccession(A, B))

• MINERful, the mining algorithm of MailOfMine, faces this unresolved issue

P. 19

Constraint templates are not independent of each other

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 20: Claudio Di  Ciccio , Massimo  Mecella

MINERful

• Key idea: building a knowledge base with local and global statistics on the mutual order of appearance of events for further fast querying

• Performances: the algorithm is proven to be fast (over 12m events processed in less than 170 secs.)

• Asymptotically:• linear in the number of the traces• quadratic in the number of events per trace

• i.e., polynomial in the input size

• linear in the number of constraint templates

• See [DiCiccioMecella/TR12] for further reading

P. 20

The declarative workflow mining algorithm of MailOfMine

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 21: Claudio Di  Ciccio , Massimo  Mecella

LTL semantics

• ConDec, DecSerFlow and Declare adopt Linear-time Temporal Logic (LTL) for expressing the semantics of the constraint templates.• See [AalstEtAl06, MaggiEtAl11] for further

reading• Van der Aalst, W. M. P. : “Auditing 2.0 Using

Process Mining to Support Tomorrow's Auditor”• Available at

http://www.processmining.org/_media/presentations/process-mining-and-auditing-siks-course-2010-wvda.pdf

• See slides 61-67

P. 21Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 22: Claudio Di  Ciccio , Massimo  Mecella

On the representation of artful process schemata

• In MailOfMine, each constraint in the set which can be used to define an artful mined process is expressible through regular grammars, where:

– activities are terminal characters, building blocks of constraints on tasks;

– constraints are regular expressions, equivalent to regular grammars;

– the process scheme is the intersection of constraints defined on top of activities.

• The process scheme defines a Process Describing Grammar (PDG)

Regular grammars expressing declarative workflows

P. 22Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 23: Claudio Di  Ciccio , Massimo  Mecella

On the usage of regular grammarsThe rationale: why not LTL for declarative workflows?

• Temporal logic is a formalism for describing sequences of transitions between states in a reactive system

• Linear Temporal Logic (LTL, [Pnueli77]) describes events along a single computation path

• LTL formulæ are verified over semi-infinite runs

– defined over Kripke structures

• They are good for automatically checking the correct work of circuits or server programs

– Not for human processes• which have both a starting point and an end

“In the long run, we are all dead’' (John Maynard Keynes)

• Regular grammars are verified by Finite State Automata

– working with less complex algorithms, in terms of computational effort

• A PDG describes the language spoken by collaborative organisms in terms of activities

P. 23Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 24: Claudio Di  Ciccio , Massimo  Mecella

On the usage of regular grammarsConstraint templates as regular expressions

P. 24Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 25: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processesAn example of DecSerFlow [VanDerAalstEtAl06] notation

No, it is not the initialaction

You could even start from here

• You might want to run a legal trace like this:

⟨ a3, a3, a3, a2, a2, a3, a4, a5, a6, a7, a6, a5 ⟩• What we want to state here is that such a notation is probably not quite intuitive

P. 25Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 26: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processesOur proposal

• We do not consider a static graph-based global representation alone the best suitable solution.

• A graphical representation, easy to understand at a first glimpse, must be used.

• Idea:– when presenting the process schema (static view):

1) a local view on tasks/activities, showing related constraints only;

2) a global view on the process, either:a) basic (less information, less symbols), or

b) extended (more information, more symbols, extending (a));– (2) can work as a kind of navigation map for (1)

1) when presenting the running instance (dynamic view):1) a dynamic interactive trace representation diagram, based on the local

static view notation.

1) See [DiCiccioEtAl2011] for further reading

P. 26Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 27: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of processesIntroducing the new local view: the rationale

P. 27Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 28: Claudio Di  Ciccio , Massimo  Mecella

On the visualization of constraintsThe static local view: some examples

P. 28Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 29: Claudio Di  Ciccio , Massimo  Mecella

On the representation of processesThe static global view

Basic Extended

P. 29Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 30: Claudio Di  Ciccio , Massimo  Mecella

A GUI sketchLocal and global views together

P. 30Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 31: Claudio Di  Ciccio , Massimo  Mecella

On the representation of constraintsDynamic view

P. 31Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 32: Claudio Di  Ciccio , Massimo  Mecella

Open challenges

• Event frequency handling in MINERful• Error injection and robustness testing/improving• Auto-thresholding• Definition of a basis for declarative processes

• Graphical model for declarative processes in MailOfMine• Implementation and usability testing• Auto-refactoring of the dynamic view• Refactoring in case of user-driven deviations from the

process model

Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma) P. 32

Your contribution is welcome

Page 33: Claudio Di  Ciccio , Massimo  Mecella

References

• [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011).

• [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009)

• [AalstEtAl2004] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004) 1128–1142.

• [WenEtAl2007] Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2) (2007) 145–180.

• [GüntherEtAl2007] Günther, C.W., van der Aalst, W.M.P.: Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics. BPM 2007: 328-343.

• [WeijtersEtAl2001] Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10 (2001) 2003.

• [MedeirosEtAl2007] Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experi- mental evaluation. Data Min. Knowl. Discov. 14(2) (2007) 245–304.

• [AalstEtAl2010] van der Aalst, W., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Gnther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9 (2010) 87–111 10.1007/s10270-008- 0106-z.

Cited articles and resources, in order of appearance

P. 33Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)

Page 34: Claudio Di  Ciccio , Massimo  Mecella

References

• [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006)

• [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165–176 (2009)

• [DiCiccioEtAl11] Di Ciccio, C., Mecella, M., Catarci, T.: Representing and Visualizing Mined Artful Processes in MailOfMine. USAB 2011:83-94

• [DiCiccioMecella12] Di Ciccio, C., Mecella,M.: Mining constraints for artful processes. In W. Abramowicz, D. Kriksciuniene, V.S., ed.: 15th International Conference on Business Information Systems. Volume 117 of Lecture Notes in Business Information Processing., Springer (2012) (to appear).

• [DiCiccioMecella/TR12] Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Technical report, Dipartimento di Ingegneria Infor- matica, Automatica e Gestionale “Antonio Ruberti” – SAPIENZA, Universita` di Roma (2012).

• [AalstEtAl06] van der Aalst, W.M.P., Pesic, M.: Decserflow: Towards a truly declarative service flow language. Proc. WS-FM 2006

• [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. In: CIDM, IEEE (2011) 192–199

• [Pnueli77] Pnueli, A.: The Temporal Logic of Programs. Proc. 18th Annual Symposium on Foundations of Software Technology and Theoretical Computer Science, 1977

Cited articles and resources, in order of appearance

P. 34Process MiningClaudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)