Eindhoven University of Technology MASTER Combining model … · Combining model learning results for interface protocol inference Master’s Thesis Nan Yang Department of Mathematics

Eindhoven University of Technology

MASTER

Combining model learning results for interface protocol inference

Yang, N.

Award date:2018

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/combining-model-learning-results-for-interface-protocol-inference(c330d361-c5bf-4333-9bd7-86040ba6ba8e).html

Combining model learningresults for interfaceprotocol inference

Master’s Thesis

Nan Yang

Department of Mathematics and Computer ScienceSoftware Engineering and Technology Group

Supervisors:dr. Alexander Serebrenikdr. ir. Ramon Schiffelers

M.Sc. Kousar Aslam

Final version

Eindhoven, March 2018

Abstract

ASML manufactures complex lithography machines that are used to produce integrated circuits(ICs). The machine is a highly complex cyber-physical system containing huge amount of software.ASML uses Model-driven engineering (MDE) techniques to develop software. MDE proved to beuseful in increasing efficiency, reducing learn-in times of employees and increasing quality of thesoftware. However, large part of existing software has been developed with traditional engineeringpractices, e.g. manual coding. Behavioral models describing the behavior over interfaces arerequired to understand, refactor and modernize this traditionally engineered software. Since itis very time consuming to re-engineer models for this part of software manually, ASML aims forcost effective transition from existing software code to models using efficient automated modelinference techniques.

Several model inference techniques are available in literature to retrieve the behavior from software.These techniques can be categorized as static or dynamic techniques. Static software analysis in-volves code inspection, reasoning over possible behaviors without executing software. The analysisresults do not contain the unknown run-time properties of the software. Dynamic software ana-lysis techniques, analyze the actual execution of the software, either by analyzing execution traces(passive learning), or by interactive interaction with software components (active learning).

Active learning techniques exhaustively query the SUL in an isolated setup and thus learn a modeldescribing the implemented behavior. Passive learning techniques learn the partial interactionsbetween the SUL and its environment from (usually incomplete) communication logs. The previousstudy suggests that the learned results of these two techniques are complementary for inferringinterface protocols. In this thesis we investigate the methods to combine active and passive learningresults for interface protocol inference.

First of all, we study the challenges for applying active and passive learning techniques for interfaceprotocol inference and their impact on the learned results. Second, we propose an approach tocombine the results. This approach analyzes the relations between the learned results in termsof behavior, address some of the challenges by incorporating different learned results, and finallycombine the intended results with the involvement of expert knowledge.

The steps of obtaining learned results from different techniques, analyzing the quality of thelearned results, and the proposed combining approach are integrated into an interface protocolinference workflow. We concertize the workflow by applying it to an ASML software component.This case study can serve as a practical guideline for future study.

As a result, our study gives an overview of challenges for applying learning techniques on industrialsoftware. We propose a way to analyze the relations between learned results obtained from differenttechniques and logs. Moreover, we suggest a method to combine the learned results for interfaceprotocol inference.

Keywords: Active learning, Passive learning, Interface protocol inference, Combining modellearning results

ii Combining model learning results for interface protocol inference

Acknowledgements

Ten months ago, I started this project with a lot of curiosity about academic research. It has beena journey of self-discovery for me. I believe the things I learned from this project are invaluableand would have profound impact on my future career and life. Here I would like to express mygratitude and appreciation to people who supported me throughout this period.

First of all, I would like to thank my supervisors, Dr. Alexander Serebrenik and Dr. ir. RamonSchiffelers, for their guidance. I have enjoyed every discussion we had together. There is a Chineseidiom about teaching: educating students in accordance with their aptitude. My supervisorsimplemented this theory with great patience. They gave me intense attention and offered mechances to improve not only academic knowledge but also soft skills. They taught me how toovercome the fear of uncertainty and embrace it. Now I see my personal growth and cannot thankthem more for their coaching.

I would like to express my deep gratitude to my daily supervisor Kousar Aslam. She gave me avery warm welcome on the first day and continuously supported me throughout the project. Herencouragement has been very helpful for me to express my ideas and doubts. It is my honor to beher student.

I would like to thank Yaping Luo for her useful advices on the project management. I would liketo thank Prof. dr. ir. Jeroen Voeten and Dennis Henderiks and for their constructive suggestionson my work. I would like to thank Thomas Neele for his advices on my thesis. Moreover, I wouldalso thank other colleagues within ASML SW Research Group. Their critical attitude towardresearch influenced my methods of studying.

Additionally, I would also like to acknowledge my exam committee member Dr. Pieter Cuijpersfor offering his time to review this thesis and attend the defense.

Moreover, I would like to thank my friends for their family-like friendship. I would like to thankall TU/e librarians for their daily warm greetings. Finally, I would like to use this opportunity tothank my mom for supporting me to step outside of my comfort zone and explore the unknown.

Nan YangEindhoven, March 2018

Combining model learning results for interface protocol inference iii

Contents

Contents iv

List of Figures vi

List of Tables vii

1 Introduction 11.1 Research statement and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Challenges of techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Combination of learned models . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Preliminaries: active and passive learning 42.1 Model inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Active learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Passive learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 State merging algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.2 Process mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.3 Comparison of state merging algorithms and process mining . . . . . . . . . 7

2.4 Complementary nature of active and passive Learning results . . . . . . . . . . . . 10

3 Interface protocol inference workflow 113.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 Active learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Passive learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.3 Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Challenges of interface protocol Inference 164.1 Active learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Establishing abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1.2 Ensuring the completeness of learning results . . . . . . . . . . . . . . . . . 18

4.2 Passive learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.1 Segmenting logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.2 Generating negative examples . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.3 Evaluating completeness of logs . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Combining approach 225.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 Step 1: ensuring log is included in the active learning result . . . . . . . . . . . . . 23

iv Combining model learning results for interface protocol inference

CONTENTS

5.3 Step 2: classification of overapproximation within the passive learning result . . . . 245.4 Step 3: classification of overapproximation within the active learning result . . . . 255.5 Generating traces from automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.5.1 Model-based testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.5.2 W-Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.5.3 Function GetTraces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.6 Discussion and related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Case study: interface protocol inference at ASML 316.1 Interface protocols at ASML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.1.1 Traditional software development . . . . . . . . . . . . . . . . . . . . . . . . 316.1.2 Model-driven engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.2 Case selection and goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.1 The SUL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.3.3 Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Conclusion and future research 467.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.2.1 Learning data-flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.2.2 Evaluating passive learning algorithms . . . . . . . . . . . . . . . . . . . . . 477.2.3 Mining property instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.2.4 Combining techniques in learning process . . . . . . . . . . . . . . . . . . . 487.2.5 Integrating existing domain knowledge . . . . . . . . . . . . . . . . . . . . . 48

Bibliography 49

Combining model learning results for interface protocol inference v

List of Figures

2.1 Active learning setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 A ”flower model” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 Overview of interface protocol inference workflow . . . . . . . . . . . . . . . . . . . 12

4.1 Abstraction layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 A state machine example for log segmentation . . . . . . . . . . . . . . . . . . . . 194.3 Passive learning result for the log {a, b, c, d} . . . . . . . . . . . . . . . . . . . . . . 194.4 Different passive learning results for the log {a, b, a, b, a, b} . . . . . . . . . . . . . . 19

5.1 A simple system scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 The relations between MAL/IF and LogIF . . . . . . . . . . . . . . . . . . . . . . . 235.3 The relations between L(MAL/IF ), L(M ′AL/IF ) and LogIF . . . . . . . . . . . . . . 24

5.4 The relations between L(M ′′AL/IF ), L(M ′AL/IF ), L(MAL/IF ) and LogIF . . . . . . . 255.5 Confidence in the interface protocol behavioral inclusion . . . . . . . . . . . . . . . 265.6 An example for explaining W-Method . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1 Traditional development process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.2 Non-conformity between interface protocol and implementation . . . . . . . . . . . 336.3 Model-Driven Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.4 ASD example: alarm system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.5 The context of LOPW wtc component . . . . . . . . . . . . . . . . . . . . . . . . . 376.6 An overview of the workflow execution for the ASML case study . . . . . . . . . . 386.7 Dotted chart for the input logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.8 Fitness and precision measurement for MPL/IF (k=1)(left), MPL/IF (k=2)(middle)

and MAL/IF (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.9 The relation between MPL/IF and MAL/IF . . . . . . . . . . . . . . . . . . . . . . 436.10 The precision for the combined model . . . . . . . . . . . . . . . . . . . . . . . . . 45

vi Combining model learning results for interface protocol inference

List of Tables

2.1 Strengths and weaknesses of active and passive learning results for interface protocolinference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.1 Matrix for trace categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.1 A summary of the length of the shown patterns . . . . . . . . . . . . . . . . . . . . 416.3 Unresolved traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Combining model learning results for interface protocol inference vii

Chapter 1

Introduction

ASML manufactures photolithography machines that are used in the production process of integ-rated circuits. The machines consist of many intercommunicating components which are undercontrol of software. Currently, the software base of the ASML machine is roughly 50 million linesof code. It is a challenge to develop and maintain software on such a large scale. In traditionalsoftware development process, software is manually coded according to a specification. Duringits lifetime, this software is frequently changed corresponding to Lehman’s laws of software evol-ution [28]. As stated in Lehman’s laws, it is not uncommon that over time, the software and itsdocumentation get out-of-sync, e.g. they do not correspond to each other anymore, and that theavailable knowledge about the software decreases due to career changes of the original softwaredevelopers.

In recent years, Model-driven engineering (MDE) has been adopted within ASML software de-velopment team. Unlike the traditional development process, MDE uses models as the mainsoftware artifacts. By modeling, developers define the specification by means of formal modelsand then formally verify the behavioral correctness of software in the early phase of development.This paradigm allows developers to abstract from implementation details, which is expected tofacilitate not only communication among different roles of engineers but also long-term softwaremaintenance [18].

As a result, the demand to shift from traditional software engineering towards the MDE paradigmis growing in ASML, for promoting both technical and business performance. Because the qual-ity of software is formally verified on model level, it reduces the development and maintenanceoverhead [35].

However, in ASML’s existing software base, there is still a large portion of software which was de-veloped with the traditional software development process. To increase the portion of MDE-basedsoftware, it is necessary to obtain models from the traditionally engineered software. Consideringthe complexity of the software, understanding it, and then modeling its behavior from scratch canbe time-consuming and laborious. Therefore, inferring models from the exerting software baseseems to be more efficient and scalable.

The ASML software research group has been working on model inference projects for a few years.Two main conclusions have been made based on their previous studies. The first conclusion isabout which part of software behavior should be inferred. In practice, when facilitating softwaremodernization in a component-based software system, it is essential to preserve the external be-havior of a component so that the overall functionality of system remains unchanged. Then theinternal behavior tends to be redesigned. Considering this design strategy, the goal was formulatedas inferring models that describe the interface protocol between components.

The second conclusion has been made based on the studies of model learning techniques. In the

Combining model learning results for interface protocol inference 1

CHAPTER 1. INTRODUCTION

general domain of software analysis, techniques can be classified into static and dynamic analysistechniques [32]. Static analysis, as a while-box approach, involves code inspection, reasoning overpossible behaviors without executing software [13]. The analysis results are mostly conservativedue to the unknown run-time properties of the software [52]. Dynamic analysis techniques, on theother hand, require software to be executed to learn its runtime behavior which might cannot belearned from static perspectives. Depending on when the analysis is performed, dynamic analysiscan be further categorized into online and offline analysis. Online analysis is applied when softwareis running while offline analysis is performed on records when the execution is finished.

Model learning is a set of techniques which consider software components as blackboxes, and aimto construct state diagram models from them. This set of techniques is a subset of dynamicanalysis techniques, requiring the execution of the software. It can be further categorized intoactive and passive learning techniques corresponding to the concept of online and offline tech-niques respectively. In general, these two categories of techniques have their own strengths andlimitations. Active learning techniques require the isolation of the System Under Learning (SUL)from its system environment and connects it to a piece of software, a so-called learner, which iscapable of iteratively asking queries and learning automata based on the SUL’s response. Usingsuch techniques, the complete externally visible behavior of the implementation can be learned asthe learner exhaustively queries the SUL until learned automata has been converged. However,it cannot learn how the SUL is actually being used by its natural environment because of theisolation. Passive learning techniques learn automata based on execution logs that are observedover the interface between the SUL and its environment. Such execution logs contain informationabout how the SUL interacts with its environment. In practice, execution logs are captured dur-ing some period of time. It is likely that the SUL was not used in all possible ways as it usuallywill be used by its environment. The automata learned from such logs only contain the observedbehavior.

The initial study of active and passive learning techniques revealed that they have complementarystrengths and weaknesses, which leads to a reasonable hypothesis that combining their learningresults could produce a better method for interface protocol inference.

1.1 Research statement and scope

The main goal of this research is to develop a method to (semi-automatically) infer interfaceprotocols for existing component-based software. For this, we first investigate the challengesof existing model inference techniques. Then, we analyze the relations among their respectiveinference results in order to propose a possible combination in terms of an interface protocolinference method.

1.1.1 Challenges of techniques

Active and passive learning techniques have their own challenges. We study how those techniquescan contribute to the aim of inferring interface protocols; what challenges we are facing in practice,and how could we possibly solve them.

Our research regarding challenges of different dynamic model learning techniques is driven by thefollowing research questions:

RQ 1: What are the challenges of different techniques for learning interface protocol?

RQ 2: How different techniques can possibly be combined to overcome their challenges?

We discuss the challenges in Chapter 4. The solutions for some of them are discussed in Chapter 5.

2 Combining model learning results for interface protocol inference

CHAPTER 1. INTRODUCTION

1.1.2 Combination of learned models

Having discovered limitations of the techniques and the challenges in applying them, we need toplace all the elements together and analyze the relations among them so that a possible combin-ation approach can be proposed.

Our research regarding combination of learned models obtained from different techniques is drivenby the following research questions:

RQ 3: How learned models, SUL and interface protocol can be related to each other?

RQ 4: How to analyze and compare the inferred behavior in learned models?

RQ 5: How to combine learned models based on analysis results?

We address these research questions in Chapter 5.

1.2 Contribution

The main contributions of our work are as follow.

• An overview of challenges for applying learning techniques on industrial software.We have summarized the respective challenges that are possibly encountered using activeand passive learning techniques. In particular, we have also identified some challenges forcombining learning techniques. The solutions for those challenges has been discussed.

• A way to analyze the relations between learned models and logs. We have ana-lyzed the relations between learned models and logs in terms of behavior. The overlappedand different parts of behavior have been interpreted. For each part of behavior, we havediscussed the confidence in being the subset of the interface protocol. Our analysis approachcan serve as the basis for combining learned models obtained from different techniques.

• A method to combine learning results for interface protocol inference. We haveproposed a workflow illustrating each step for inferring interface protocols. This workflowintegrates analysis techniques from process mining domain and involves expert knowledgein classifying the learned overapproximation. In addition, our experience of applying theworkflow to an ASML software component provides a guideline for future case studies.

1.3 Thesis outline

This thesis is organized as follows. Chapter 2 describes the preliminaries for this research. Hereactive and passive learning techniques are described. Also, the differences between different typesof passive learning techniques are discussed. The proposed workflow for interface protocol infer-ence is introduced in Chapter 3. It gives high-level overview of how to infer interface protocols.Chapter 4 introduces the challenges of applying techniques for interface protocol inference. It alsodiscusses the possible solutions for them. Chapter 5 illustrates the proposed approach for com-bining active and passive learning results. Then a case study applying the workflow to an ASMLcomponent is shown in Chapter 6. Finally, the conclusions and future research are presented inChapter 7.


Chapter 2

Preliminaries: active and passivelearning

This chapter introduces active and passive learning techniques based on our previous studies [33].Their natural properties, and the limitations for interface protocol inference in particular arediscussed.

Section 2.1 briefly introduces the existing model inference techniques and their classifications.Sections 2.2 and 2.3 give a respective introduction of these two categories of techniques. Finally,Section 2.4 discusses the complementary nature of them in terms of their learning results.

2.1 Model inference

Model inference is the task of extracting knowledge and capturing this knowledge with behavioralmodels in an automated way. In the software engineering domain, the aim of inference is to derivea model that describes the behavior of a system, usually called the System Under Learning (SUL),in such a way that valuable business information can be extracted [42].

Work in this area dates back to 60s, where several researchers focused on inferring the regularlanguage that can be presented by a finite automaton from a set of sentences. In 1967, Gold [14]proved that it will require an infinite number of positive examples (i.e., accepted sentences bythe automaton) to infer any infinite language. This theory indicated the importance of negativeexamples (i.e., unaccepted sentences by the automaton) to produce correct results.

To address this problem, a number of techniques have emerged. They can be broadly classifiedinto two types, namely active and passive learning. The baseline algorithm in active learning fieldis L* [5] developed by Angulin in 1987. This type of technique takes the SUL as a black-box col-lecting positive and negative examples by iteratively posting queries to the SUL and subsequentlyobserving its responses. In contrast, passive learning uses traces to construct models. The traceconsists of a sequence of activities (or events) that happen during software execution. The majorityin this field is based on the idea of state merging which was originally proposed by Trakhtenbrotand Barzdin [40]. This work was developed under an assumption that a complete set of traces(both positive and negative examples) is provided.

Process mining [9], as a process management technique, has been developed and used to analyzetrends and patterns based on event logs (where usually only positive examples exist) recorded froman information system. We are aware that significant differences between process mining and statemerging algorithm exist, from theories to applications. However, in this thesis, we categorize both


CHAPTER 2. PRELIMINARIES: ACTIVE AND PASSIVE LEARNING

state merging algorithms and process mining techniques as passive learning because they derivemodels from the provided traces. Their major differences are discussed in Section 2.3.3.

2.2 Active learning

In 1987, Angluin proposed the L* algorithm [5]. This algorithm lays the foundation for activelearning technique. It assumes the presence of a teacher who provides correct answers to mem-bership and equivalence queries. Figure 2.1 shows an schematic overview of the algorithm. Thelearner is able to construct a hypothesis model incrementally by asking membership queries (MQs)that are input sequences. The teacher consists of the SUL and conformance tester. The SULprovides answers in response to MQs. With several membership queries, the learner constructsan initial hypothesis which is posted by equivalence query (EQ). Then testing queries (TQs) aregenerated, using conformance testing methods, such as W-method [8] and Hopcroft [4], to checkthe conformance between the hypothesis and the SUL. The conformance tester answer “yes‘’ tothe learner if no evidence can be found to distinguish between the hypothesis and the SUL. Oth-erwise, a counterexample (i.e., a trace that shows differences) is returned. The counterexampleprovides guidances to the learner so that it does not learn randomly. If the learner obtains acounterexample from the teacher, it refines hypothesis and iterates the process till the hypothesiseventually converges to the SUL.

L* assumes that the SUL can be represented by a Deterministic Finite Automaton (DFA), anddoes not handle data. Many academic prototypes have been developed based on L* to relaxthese limitations. For example, L* only copes with deterministic behavior. Volpato and Tretmansintroduced an adaption of L* [49] to infer models from nondeterministic system. Moreover, asinput and output actions can be parameterized. Extended Finite State Machines (EFSM) is usedto model the data-flow of a component. A hybrid learning technique which combines L* withsymbolic analysis, learning data guarded behavior, was proposed by Howar et al. [21].

In practice, it can be time-consuming and costly to apply query-based techniques, such as activelearning, on a (large scale) system as a large amount of queries might be required. Hungar etal. [24] observed that it requires 19.426 MQs to learn a 32 state system, and the required amountrose significantly to 132.340 MQs for 80 states. Several variants, such as TTT [25] and RivestSchapire [38] were proposed to promote the efficiency of learning by reducing redundant queries.

Figure 2.1: Active learning setup [42]



2.3 Passive learning

2.3.1 State merging algorithms

State merging algorithms learn a DFA from a given set of traces. First of all, the algorithmsuse positive traces to construct a state machine, so called Prefix Tree Acceptor (PTA). In suchPTA, where two strings reach the same state if they share the same prefix until they reach thestate. Then the pair of states that is considered to be equivalent is merged iteratively. One canthink of merging as a process of generalization. In each iteration, the merge is validated by aspecific rule to avoid overgeneralization. Only the merges that do not violate the rule are valid.The differences among existing state merging algorithms mainly fall into the definition of theequivalence and validation rule. Here we briefly introduces two algorithms and their definitionsfor these two criteria.

The K-tails algorithm [7] merges a pair of states that have identical suffixes of length k. It doesnot have a validation rule. Therefore, the degree of generalization is controlled completely by thevalue of k. A lower value of k leads to more merges.

The Regular Positive and Negative Inference (RPNI) [34] algorithm attempts to merge as manypairs of states as possible. It relies on the negative examples to validate each merge. For this al-gorithm, the availability of negative examples is vital. Without the presence of negative examples,the algorithm simply produces a so-called flower model consisting of only a single state withself-loop transitions, each for every symbol, which accepts any sequence. The RPNI algorithmguarantees that it produces an exact identification of DFA if the input trace set satisfying thenotion of characteristic. Here we informally introduce the definition of characteristic. We refer tothe formal definition given by Dupont et al. [11]. There are two conditions for this notion. Assumethere is a target DFA M to represent the SUL. The first condition is called structurally completewith respect to M . It requires that with the provided set of positive traces, all transition of Mare visited at lease once, and every accepting state in M is used as accepting state of at least onetrace [11]. This condition guarantees that the provided positive trace set is representative enoughof the language described by M . The second condition guarantees that for each pair of states inM there is at least a negative trace available to distinguish them so that this pair of states willnot be merged during learning (i.e., avoid overgeneralization).

Some state merging algorithms can also deal with data. For example, Walkinshaw [53] introduceda way to combine data mining techniques with state merging algorithms to learn data-flow.

2.3.2 Process mining

Process mining was initially developed as a toolset to discover processes from event logs, primarilyapplied on the field of business process modeling. Van der Aalst [43] gives an overview andhistorical perspective of this field.

In the process mining community, a number of approaches are used to develop algorithms. Thereis a family of algorithms using ordering relations between events shown in logs to construct models.The alpha algorithm [45] and its variants [54] are examples of this approach. One of the mostadvanced process discovery algorithms, inductive mining [27], uses a divide-and-conquer approachto split the event log recursively into sublogs. The idea is to divide events into groups and thendiscover the relations among groups.

Furthermore, the event log contains not only a sequence of events, but also the timestamp thatevent occurs. In addition, resource usage associated with each event may also be logged, whichenriches the discovered process with different views, such as performance view where the durationfor each process is shown. De Leoni [10] gives an overview of which dynamic characteristics canbe captured with process mining.



Figure 2.2: A “flower model‘’ accepting any logs containing events from {a,b,c,d,e,f,g,h} [47]

2.3.3 Comparison of state merging algorithms and process mining

Although both state merging algorithms and process mining use existing trace sets to constructmodels, there are many differences between them. This section discusses the main differences andtheir influence on the learning results.

2.3.3.1 Model quality criteria

Model quality criteria are used to evaluate the quality of the learned models. In process miningcommunity, two related model quality criteria [47] are defined for quantifying the quality of learnedmodels against the event logs. These two model quality criteria are fitness and precision.

A model with perfect fitness can replay all the logs. A model with perfect precision only allowsthe behavior recorded in the log. On the contrary, an overgeneralized model shows more behavioreven when there are no indications of that behavior in the log, which decreases the precision ofmodels. An example shown in Figure 2.2 distinguishes fitness and precision. This “flower model‘’is an extreme model which accepts all traces starting with event start, having any sequence ofevent from {a,b,c,d,e,f,g,h} in between, and ending with event end. The fitness score of this modelwith respect to any logs consisting of such traces is perfect. But apparently this model might allowmuch more traces than what has been seen in the log, resulting in a low precision score. Anotherextreme model is “enumerating model‘’ which simply encodes all traces in the log. It extremelyoverfits the log and does not allow any unobserved behavior. The precision and fitness score forsuch models are perfect.

Different process mining algorithms inherently have different levels of capability to adjust these twoquality dimensions. Both Inductive Mining [27] and Region-based Mining [47] guarantee prefectfitness, even though they result in different levels of precision. Region-based Discover ensuresto allow only minimally more behavior than what is observed in the log (i.e., focus on higherprecision). For having a simple structure, Inductive Mining tends to introduce a lot of concurrency(i.e., multiple events can be executed in parallel) and thus produce less precise models. Apartfrom that, the advanced algorithms usually expose many configurable parameters to users foraddressing different needs.

Most state merging algorithms guarantee that the learned models can accept all the positiveexamples and reject all the negative examples (i.e., perfect fitness). In addition, the concept ofprecision can be applied to state merging algorithms as well. Merging is a process of generalization.More merges result in a higher level of generalization (i.e., less precise models). In contrast, morerigorous validation of merging leads to a higher level of precision.

2.3.3.2 Model formalisms

Most of state merging algorithms were developed to produce a DFA from a set of provided traces.The DFAs produced by the algorithms are prefix-closed which means in case a trace is accepted



by the automaton then its prefixes are also accepted by that automaton. As a result, only onerejecting state reached by all the illegal transitions exists in such DFAs. Thus, DFAs are safe tobe converted into a corresponding Label Transition Systems (LTS) by removing the rejecting state[23].

The results of most process mining algorithms have are in terms of Petri nets rather than transitionsystems. The reason is that Petri nets allow to model concurrency concisely, whilst transitionssystems can only enumerate interleavings. For the business application where processes oftenhappen concurrently, this representation promotes the expressiveness, simplicity and understand-ability of learned models. However, undesired overgeneralization can be introduced in the processof discovering concurrency. To illustrate, assume that we have a set of sequences {ab, ba, ac, ca}in the log where event a has a concurrent relation with b and c respectively. In order to reducethe duplicated actions used in models, Inductive Mining produces a model where event a, b andc have concurrent relation with one another, even though there is no indication of concurrencybetween event b and c from the log.

2.3.3.3 Negative examples

Process mining algorithms work on the belief of the fact that event logs usually contain onlypositive examples while negative examples are unavailable in most of real cases. Another factis that real-life event logs often contain noise (i.e., infrequent traces) and usually are sparse.Considering the inevitable facts mentioned above, the aim of process mining is not to infer anaccurate model but an approximate model. Theoretically, several state-merging based algorithms,such as RPNI and EDSM, ensure to produce an accurate model as long as the given trace set ischaracteristic. In reality, providing such a characteristic trace set is hard.

The practical challenge, the lack of negative examples, turns into a theoretical challenge whichmotivates the development of algorithms that can approximate models. For example, a variantof Evidence-Driven State Merging (EDSM) proposed by Hammerschmidt [19] involves expertknowledge in validating merges to reduce the need of negative examples.

2.3.3.4 Incompleteness and noise

Similar to any typical data mining technique, passive learning is challenged by the incompletenessof data. The notion of completeness defines how complete the training data should be for analgorithm to accurately infer the target behavior. In reality, one can not expect that the trainingdata contains all possibilities.

In process mining, the notion of completeness for each algorithm has been either explicitly orimplicitly given [47]. A strong notion of completeness requires more observations from logs toinfer the relations between activities while a weak notion requires fewer observations. In order toreduce the needs of massive observations, several algorithms with a weak notion of completenesshave been developed. Van der Aalst [47] explains completeness notion using alpha-algorithm as anexample; he considered a target model which has 10 activities that can be executed concurrently.The number of all possible interleavings of the concurrent behavior should be 10! = 3, 628, 800.It is almost impossible to collect a set of event logs that contains all the possibilities. Hence,weak completeness notions are required to reduce the number of needed observations for learningsuch behavior. The alpha-algorithm use a relatively weak notion of completeness called localcompleteness (i.e., if there are two activities a and b, and a can be directly followed by b, thenthis should be observed at least once in the log). With this algorithm, only 10 × (10− 1) = 90interleavings are required to learn the concurrent structure.

When it comes to state merging algorithms, the notion of completeness is defined as characteristic,which requires not only positive examples to reach every state but also negative examples to



distinguish every pair of non-equivalence states. It is a strong completeness notion in practice.Research is carried out in two different directions to address the lack of data. One idea is toreduce the required number of traces, such as avoiding the use of negative examples. Another isto supplement traces by querying experts. Dupont and Lambeau introduced an algorithm calledQuery-driven State Merging (QSM) [11] which learns models from provided traces and asks queriesto prevent bad generalizations.

The definition of noise is different in process mining context than in the context of state mergingalgorithms. In process mining, human judgment and domain knowledge are required to preprocessthe logs and remove errors. This assumes that the input logs for the mining algorithms are error-free. Therefore, this term is used to refer to infrequent traces rather than logging errors [47]. Thetypical way used to deal with noise is integrating a configurable frequency threshold to ignore theinfrequent traces when discovering models.

In the context of state merging algorithms, the term noise is used to refer to mislabeling of asequence [17]. For example, a sequence is noise if it is a negative example but accidentally labeledas positive. In practice, the noise exists when logging errors occur. Several variants of existingstate merging algorithms have been introduced to address noisy data. Many of them, such asBlue* [39], developed a statistical evaluation at each iteration of merging, to select the mergesthat minimize the risk of overfitting.

2.3.3.5 Application domains

Process mining is widely applied in information systems to capture business crucial information.It is mainly used to identify patterns for the performance or efficiency analysis. Process mining hasbeen used to discover the care process in hospitals and suggest improvements in IT supports [31].ASML has employed this technique to analyze the test process for scanners and detect bottlenecksin the same [29]. Gupta et al. [16] has applied this technique to discover the bug fixing process.For such applications, the traces are usually long and events often can happen in parallel, whichmotivates the development of algorithms to cope with such concurrency.

State merging algorithms target at learning behavioral models from software traces. Lorenzoliet al. [30] proposed algorithm Gk-tails to learn components interactions. Walkinshaw et al. [50]enhanced QSM algorithm with model checker for learning software behavior.

2.3.3.6 Conclusion

Process mining and state merging algorithms are different in terms of the used approaches, theinput data, model formalisms, and applications. The differences largely influence the learnedmodels. State merging algorithms serve the purpose of learning software behavior model better.This type of algorithm derives from the concept of merging which provides a intuitive way tocontrol generalization. The exposed parameters of such algorithms are easy to understand andconfigure for non-experts. Our practical experience with process mining suggests that expertknowledge of the used algorithm is required for tuning parameters and interpreting learned results.Improving usability for non-experts is one of the most important challenge [46]. Furthermore,the concurrency introduced by process mining algorithms does not conform to actual behavior ofsoftware.

For the reasons above, state merging algorithms are more suitable for software behavior inference.However, the metrics of fitness and precision from process mining are practical for evaluatinglearned models. Therefore, we attempt to leverage their individual strengths in our work.



Strengths Drawbacks

Activelearning

It learns the complete visiblebehavior of the implementation.

It cannot capture the actualcommunication between the systemand its environment.

Passivelearning

It learns the recorded communicationbetween the system and its environment.

The result is limited to theobservations in the input log.Hence, it is usually incomplete.

Table 2.1: Strengths and weaknesses of active and passive learning results for interface protocolinference

2.4 Complementary nature of active and passive Learningresults

In the preparation phase of our study [33], we applied both active and passive learning techniqueson a small software system and observed that the learned behavior results of active and passivelearning were complementary. This leads to the basic hypothesis of our project - combiningthe results could produce a better solution for interface protocol inference. Here we argue thishypothesis from the conceptual point of view.

The query and response mechanism ensures the active learner exhaustively queries all combinationsof words till the hypothesized model converges to the target model representing the SUL. Due to theonline interactions, this technique can learn the externally visible behavior of the implementation.However, as the SUL is learned in isolation environment, the result cannot represent the actualinteractions between the SUL and its original system environment.

Passive learning constructs models that reflect how the environment is interacting with the SUL.Although this technique also requires the execution of the SUL, there are two differences. Firstof all, the software should be run within its system environment. The interactions between thecomponent and its environment are captured. Secondly, this technique conducts offline learningwhich limits its learning results to the observations from the execution traces. As the complete setof execution traces is almost impossible to obtain, the learned model often only represents partialbehavior.

Table 2.1 concludes the strengths and drawbacks of active and passive learning for interface pro-tocol inference. Based on that, we have the hypothesis that, by combining the learned results, amore accurate behavioral model describing the interface protocol can be derived.


Chapter 3

Interface protocol inferenceworkflow

We propose an interface protocol inference workflow which presents how existing knowledge andour research results are used. We expect that this workflow is not specific to ASML and can beapplied to other contexts as well. We present the application of this workflow in ASML contextin Chapter 6.

We divided the interface protocol inference workflow into three phases, namely preparation, learn-ing and combining phase. Figure 3.1 shows the overview of this workflow.

3.1 Preparation

The preparation required for interface protocol inference is the phase to prepare the inputs forlearning behavioral models and combining results. The inputs for this phase are software artifactsand log file system. The outputs for this phase include a set of function signatures of the interfacesfor which its behavioral protocol has to be inferred, and event logs containing the (time-stamped)function calls that actually happened while executing the software for a while..

Learning requires the set of function signatures which can be extracted from software artifacts. Theextraction might require semantic knowledge of the software files. For instance, ASML softwarecomponents’ interface definition files specify the generic functions on the interfaces with generationoptions. The generation options decide how concrete functions are generated. Extracting concretefunction signatures from such files requires semantic knowledge to translate the generic functionsignatures into the concrete function signatures. For passive learning, the event log that recordsthe execution of the SUL should be queried.

3.2 Learning

In the learning phase, the first step is to establish a joint abstraction level for active and passivelearning. Section 4.1 explains this challenge in detail. The aim of establishing a proper abstractionis to hide certain details and construct an interpretable model. The output of this activity is amapping between the alphabets used to construct automata, and concrete function calls on the realsystems. This activity can be challenging as both active and passive learning have constraints onwhich abstraction level is learnable for them. As employed L* assumes that system is deterministic,the abstraction should only post the deterministic system to the learner. However, the coarse


CHAPTER 3. INTERFACE PROTOCOL INFERENCE WORKFLOW

Figure 3.1: Overview of interface protocol inference workflow

abstraction can introduce nondeterminism. To address this problem, Howar et al. [22] proposeda technique to refine abstraction to the learnable level during learning. With this refinementtechnique, L* starts learning from a given abstraction and refines it till nondeterminism is absent.For passive learning, the learnable abstraction depends on the granularity of logs. To achievethe joint abstraction level, we suggest to start from a relatively coarse abstraction and apply theabstraction refinement technique to find the learnable abstraction for L*. One can check if passivelearning can deal with that abstraction level, and decide if it should be refined to obtain moredetails. After this activity, we have separate paths for active and passive learning. Once thebehavioral models are learned by these two techniques, a model quality control session follows.

3.2.1 Active learning

We identified three tasks for applying active learning: connecting learner to SUL, learning andprojecting model on interface.

• Connecting learner to SUL. This activity aims for setting up learning environment. First ofall, the SUL should be isolated from its system environment. According to the establishedabstraction, a abstraction layer should be created. This layer wraps up the SUL, triggeringactions on the system and capturing actions from it. Moreover, as the SUL should be resetto its initial state after each query, one needs to configure the way to reset the system in thelearner. During learning, the learner exhaustively queries the SUL. The online interactionmechanism of active learning requires the SUL to be reactive even when illegal actions aretriggered. Our experience suggests that in some cases intervention on the SUL is required.The system should be able to notify the learner when encountering failures. This usuallycan be solved by modifying the failure handler (e.g., exception handler).

• Learning behavioral model. In this step the learner iteratively interacts with the SUL, pro-poses hypothesis, gets it validated and refines it. The time for learning varies with the



complexity of the SUL and the efficiency of the used algorithm. The algorithm TTT [25]provided in the active learning tool LearnLib [37], is an enhancement of L* and considered tobe the most efficient one [42]. In addition, one needs to select and configure the method forconformance testing between the proposed hypothesis and SUL. LearnLib provides severalconformance testing methods. Random walk is the simplest one which samples the test casesrandomly. This method tends to generate the test traces that are easiest to reach on thehypothesis. It requires one to configure the maximum amount of generated test cases. How-ever, predicting the sufficient value is non-trivial task for a complex system. Therefore, itperforms badly on large models. There is a more advanced method called W-Method whichclaims to generate sufficient test cases for detecting differences between the hypothesis andSUL, on the condition that one can estimate the maximum number of states on the SUL.We introduce this method in detail in Section 5.5.2.

• Projecting model on interface. This activity is for interface protocol inference in particular.Active learning learns all the external behavior of the implementation, whereas the goalof interface protocol inference is to learn how its client is using the SUL via the specificinterface. The interactions between the component and its servers via other interfaces areinvisible actions for the interface protocol. Therefore, our interest lies in the part of behaviorshown in that particular interface. This means that a projection activity is required to hidethe invisible behavior.

3.2.2 Passive learning

We identified three activities for using passive learning: preprocessing log, analyzing log and passivelearning.

• Preprocessing log. The queried logs usually contain irrelevant information and are unstruc-tured. The main objective of preprocessing is to process logs in such a way that it is optimalfor learning. First of all, to make them usable, parsing is required to extract informationfrom an unstructured log, and then transform it to standard log formats acceptable for thedesired passive learning tool. Secondly, to target the learning scope, filtering is conductedto remove irrelevant information. To infer an interface protocol, one should filter out theevents that are invisible on that particular interface. Moreover, to achieve the establishedabstraction, removing attributes (e.g., ignoring data parameters) or aggregating events (e.g.,collapsing the start and end entries of an event together) might be required. Apart from that,if all the executions are logged in the form of a long sequence, segmenting log is needed totransform the long sequence into execution traces. Section 4.2.1 introduces log segmentationat ASML and its challenges.

• Analyzing log. This activity aims at providing descriptive analysis for the provided logs.The analysis gives the impression on the quality of the logs and whether they are sufficientlypreprocessed or not. There are several tools supporting the analysis. ProM [48], as aprocess mining tool, integrates a visual analysis plug-in called Dotted chart which shows thedistribution of events over time and helps humans to identify patterns quickly. In addition,it is very useful to get a statistic overview of data. ProM provides log summary where thenumber of traces, the end and start event of each trace, and the occurrence of each event areshown. Another process mining tool called Disco [15] offers an overview of the process. Unlikeother process modelling languages (e.g., Petri-net), the fuzzy model used by Disco does notcontain parallel gateways, retaining the actual sequential processes presented in the log.Furthermore, checking the number of patterns and the frequency of each pattern gives morecomprehensive understanding on the logs. The number of patterns indicates the diversity oflogs and influences the completeness of learned behavioral model. The frequency of patternsprovide a different perspective. During this project, we had a chance to audit a stakeholdermeeting where the process mining learning results of a traditionally engineered component



are presented, to several developers who were working on modernizing this component andexpected to extract knowledge from the learning results. We observed that developers wereinterested in the frequent flow as it indicates the significance of the particular behavior.

• Learning behavioral model. Since in reality the data source is event log which only containspositive examples, the chosen passive learning algorithm should work without negative ex-amples. As discussed in Section 2.3.3, process mining is developed with the fact that negativeexamples are unavailable, therefore, only positive examples are used. However, using processmining is not recommended for the purpose of software behavioral inference because a lot ofundesired concurrency is introduced in the process of mining. In that stakeholder meeting,we observed that developers were confused by the learned concurrency as it deviates a lotfrom their expectation. In contrast, state merging algorithms are developed among softwarecommunities. Several of them, such as K-tails and EDSM-Markov, support learning fromonly positive examples with a certain degree of imprecision.

3.2.3 Quality Control

After executing the previous steps, active and passive learning results are obtained and requiredto be evaluated. The design of this quality control session is to address the challenges identified inChapter 4. Rest of the steps in this phase are used to control over the model quality. Two activitiesare designed, namely checking fitness and measuring precision. We formulated the activities basedon the studies of the relations between SUL, learned models and interface protocol as discussedin Chapter 5. These two activities can be executed using ProM.

• Checking fitness. The measurement of fitness quantifies how well the model can fit thelog. ProM provides the plug-in to measure it. In this activity, the log is replayed on bothactive and passive learning results. In Chapter 5 we discuss the correct active and passivelearning result should be able to completely replay the program traces. Therefore, checkingfitness on the active and passive learning result is the possible solution to detect the incorrectexecution of the previous activities. The imperfect fitness on active learning results indicatesthat maybe logs are not correctly preprocessed or the activities in active learning path arenot correctly executed. ProM provides the counterexample to show which traces is notperfectly aligned, which guides user to detect the root cause.

• Measuring precision. We measure precision to quantify how much unobserved behavior isshown in the model. Section 4.2.3 discusses the difficulty of evaluating the completeness ofthe provided log. The precision of the active learning result with respect to the log gives theindications on the completeness of the log or the generalization of the implementation (Animplementation may be too generic if the provided behavior is never used). In addition, onecan analyze and compare the learned models by comparing their precision.

3.3 Combining

The last phase is the combining phase where the learned models obtained from the previousphase are compared and combined. We designed three activities for this phase, namely comparingbehavioral models, classifying confused behavior and combining behavior. An introduction on thesesteps are given below. Detailed explanation is provided in Chapter 5.

• Comparing behavioral models. The main objective of this activity is to identify how twolearned models overlap and differ from each other. The overlapped behavior shows the partlearned by both techniques. Therefore, one can be more confident that this part of behaviorshould be included in the interface protocol. The part of behavior for which active and



passive learning techniques classify differently is overapproximation which should be furtherclassified into valid or invalid groups.

• Classifying discrepancies between the results of active and passive learning. The third-partyknowledge should be involved to classify the overapproximated behavior. In Chapter 5,we introduce a way to validate the overapproximation learned by the passive learner. Weincorporate knowledge from domain experts in the process of the classification for overap-proximation learned by the active learner.

• Combining behavior. Once the overapproximated behavior is classified, all behavior can beidentified as either accepted or rejected behavior of the intended interface protocol. Thisactivity composes a model which exactly allows all the accepted behavior and disallows allthe rejected behavior.


Chapter 4

Challenges of interface protocolInference

Active and passive learning techniques have general challenges, such as inferring parameterizedbehavior, as we mentioned in Chapter 2. For the purpose of interface protocol inference, thenatural properties of the learned results are considered as strengths and drawbacks as listed inTable 2.1. This chapter, however, discusses the identified challenges of applying the techniquesfor interface protocol inference.

We discuss the challenges based on our experience during this project. There may be morechallenges that we did not encounter and foresee. The long-term goal of interface protocol inferenceproject is inferring interface protocol from traditionally engineered software, with the combinationof active and passive learning results. As the initial phase, during this graduation project, we usedMDE-based software as the study object to derive approaches which are expected to be validatedon the traditionally engineered software in the future. We classify the challenges as those commonin other inference tasks, those expected when dealing with the ASML’s traditionally engineeredsoftware, and those that are more specific to ASML and probably common in other companiesalso.

4.1 Active learning

Connecting the software with the learner is a critical step prior to performing active learning as thisstep determines the quality of the learned results. The code of MDE-based software is generatedautomatically and thus has a good structure (e.g., the function calls on interfaces are explicitlyexposed), which make the connection easier. By studying MDE-based software, we cannot identifythe difficulties caused by the structure and complexity of software but only the common difficultiesthat are independent of the development approach. Section 4.1.1 describes how the appropriateabstraction has been established for the studied systems and the expected difficulties when copingwith the traditionally engineered software.

We introduce the challenge of ensuring the completeness of active learning results, and brieflydiscuss how other techniques can possibly address it in Section 4.1.2. In our project we did notencounter this challenge but it is expected to appear when dealing with a large system as pointedout by Vaandrager [42]. Chapter 5 presents the proposed solution in detail.


CHAPTER 4. CHALLENGES OF INTERFACE PROTOCOL INFERENCE

Figure 4.1: Abstraction layer [22]

4.1.1 Establishing abstraction

The abstraction is a middle layer between the learner and SUL. Establishing such mapping isrequired when connecting the SUL to the learner mainly for two purposes. The first purpose isto hide certain details of the system. For instance, some parameters (e.g., processing identifier forwafer) do not influence the software behavior, therefore, can be threated symbolically. Anotherpurpose is for scaling active learning techniques to a large scale application. Aarts et al. [3]conducted case studies showing how they established an abstraction layer to translates between alarge set of concrete actions of the system and a small set of alphabets of the learning algorithm.Figure 4.1 shows the idea of the abstraction. From the perspective of the learner, only theinteractions with the abstraction layer is visible. The input and output alphabet symbols (ΣI

A

and ΣOA) used by the learner are mapped to the concrete input and output actions (ΣI

C and ΣOC)

of the SUL via the abstraction layer. Aarts et al. [2] define the notion of a mapper which allowsback and forth translation between concrete actions and abstract alphabets.

In this project, the studied software components do not have parameterized behavior and are ofa relatively smaller size (fewer than 10 function calls on the interface). The abstraction we haveestablished for those components is governed by the following rules: 1) the abstraction functionon inputs is an identity function which means: ΣI

C = ΣIA; 2) the abstraction function on outputs

abstracts the sequential output actions in ΣOC caused by an input alphabet in ΣI

A into an outputalphabet in ΣO

A. This mapping strategy is based on how MDE-based software components aremodeled in terms of Mealy machines where multiple output actions in response to one inputaction are encoded into one output symbol.

Establishing a proper abstraction can be challenging for several reasons. The L* algorithm canonly cope with deterministic behavior, which requires that the abstraction presents a deterministicsystem to the learner. However, a coarse abstraction can introduce nondeterminism. This problemis considered as a major obstacle for applying active learning in industrial applications [42]. Forexample, the learner may observe nondeterminism when the abstraction ignores the parameterthat occurs in input and output actions. To address this problem, Aarts et al.[2] proposed away to refine a given abstraction when it causes nondeterminism during the process of learning.However, incorporating the suggestion of Aarts et al. in tools and demonstrating its viability onindustrial applications are still work in progress [42].

The more specific challenge to our research—combining results obtained from active and passivelearning is establishing a proper abstraction level for these two techniques. We have reasonedthat learning results from these two techniques are inherently complementary. This hypothesisassumes that the results are on the same abstraction level. However, more effort is needed tovalidate whether this is still a correct hypothesis if the results abstract differently. In this project,we constrained our work to only consider the solution for the learned models that are on the sameabstraction level. This also means that the abstraction cannot be more detailed than that of thelog because passive learning cannot learn information which is more concrete than what has beenlogged. For producing comparable results, developers should discover a learnable abstraction levelfor the applied active and passive learning algorithms.



As mentioned, establishing a proper abstraction level for actively learning a complex system ishard due to the lack of mature tools. The problem can be simpler if the automation of abstractionrefinement can be achieved. We give suggestions on how to establish a joint abstraction if suchautomated tools are available. One can initialize the tool with a coarse abstraction and then thetool iteratively get the abstraction level refined during the process of active learning until thecoarsest deterministic abstraction is found. Then one can check if this refined abstraction levelcan be learned from the logs, and decide whether the abstraction level should be further refinedfor learning more concrete behavior. It is important to note that the granularity of logging isdecided by developers and the timing criticality of the system. Intensive code instrumentation isnot suitable for systems with hard-real-time requirements as the instrumentation might changethe behavior of systems (e.g., missing critical deadline).

4.1.2 Ensuring the completeness of learning results

L* algorithm assumes the existence of an equivalence oracle which can check the equivalencebetween the proposed hypothesis (i.e., intermediate results) and the SUL. However, this equival-ence checking is implemented using traditional conformance testing in practice. The completenessand correctness of the learned models can be guaranteed if a sufficient test suite is devised. How-ever, generating such a test suite can be challenging. The active learning tool LearnLib providesseveral test generators, such as Random walk and W-Method. As mentioned in Section 3.2.1,the test generated by Random walk is biased towards the easily reached path. Hence, using thegenerated test suite can fail in detecting minor differences. This generator is parameterized by themaximum number of generated test cases. Underestimating the value of this parameter resultsin the insufficient test suite. W-Method is considered to be more advanced in the sense that itconstructs tests in a systematic way. It guarantees the sufficiency of tests given an upper boundon the number of states of the SUL. The estimation is hard to make especially for a large systemas usually developers know little about the system.

Due to all the reasons mentioned above, the completeness and correctness of the learning resultsare not ensured. The idea of our solution is to check the completeness of the learning results usingthe execution traces in the logs. The learning refinement is enabled if the learning results do notinclude the behavior shown in the logs. This solution still cannot completly eliminate the problembut it reduces the risk of missing behavior. We illustrate the solution in detail in Chapter 5.

4.2 Passive learning

Using passive learning technique in a best way can be challenging. This section discusses severalchallenges for applying passive learning and their influences on the accuracy of the learned results.Those challenges are not specific to the interface protocol inference but also in other inferencetasks.

4.2.1 Segmenting logs

Segmenting log is a log preprocessing activity which partitions a log into multiple execution traces.The ASML production log records the usage of their machines in producing circuits. The log is along sequence of events which records the executions of software. For passive learning algorithms,inputs consist of independent traces. Each trace consists of a sequence of events and describesthe life-cycle of a particular execution. Therefore, the reason that we need to segment logs isto correctly slice them into traces representing different executions. This activity is not onlyspecific to ASML. As an example, the model inference in Michelin Company, learning how productis processed in the workshop, also requires to segment the production log generated by legacy



Figure 4.2: A state machine example for log segmentation

Figure 4.3: Passive learning result for the log {a, b, c, d}

production systems [12]. The generated log contains physical location as an attribute for eachevent. Domain knowledge is leveraged to identify the entry and exit points of the workshop andthen segment logs into traces by the physical location attribute.

We discuss the needs of segmenting logs with an example. Consider a target state machine (Figure4.2), on the initial state, either the execution of event a or event c is allowed and they are followedby events b and d respectively. Assume we have a log which contains a sequence {a, b, c, d} whichrecords two separate executions {a, b} and {c, d}. If we input the whole log to passive learningalgorithms, the learned result would show that events a, b, c and d can be executed sequentiallyas shown in Figure 4.3. The result is different from intended one because the algorithms cannotidentify different executions from the log and thus simply consider it as a single execution trace.

Here we illustrate how the different segmentation strategies could influence learning results with an-other small example. Consider a log {a, b, a, b, a, b}. If we segment it into two traces {〈a, b, a, b〉, 〈a, b〉},then the learned model (Figure 4.4a) would show that events a and b can be executed alternat-ively. However, if we divide the log into traces like {〈a, b〉3}, then the learned model (Figure 4.4b)would not have the cyclic structure because there is no indication that event a can be done afterexecuting b in the log.

Therefore, designing segmentation strategies is important for inferring models. In ASML, theapplied strategy was formulated by two rules. First, a set of start events should be provided andthen the log can be divided by the start events. However, this is still far from correct. By usingonly this rule, the log {a, b, a, b, a, b} would be divided into {〈a, b〉3}. This might be incorrect asthis rule cuts the possible cyclic patterns. Therefore, another rule is used to identify whetherevent b should be followed by event a or not. The second rule requires to define a time threshold.If the time interval between b and the following a is longer than the defined threshold, thenthey are considered to be produced by different executions. However, it is difficult to configure a

(a) Passive learning resultfor {〈a, b, a, b〉, 〈a, b〉}

(b) Passive learning result for{〈a, b〉3}

Figure 4.4: Different passive learning results for the log {a, b, a, b, a, b}



proper time threshold. Considering this difficulty in practice, one cannot ensure the correctnessof segmented results.

4.2.2 Generating negative examples

As mentioned in the Chapter 2, negative examples are used to avoid over-generalization in statemerging algorithms. In most cases, event logs only contain positive traces that have been executed.It is hard to obtain negative examples. Even though domain knowledge can be injected to createsome negative examples, it is usually impossible for developers to list a complete set of negativeexamples.

Walkinshaw et al. [53] proposed to use program mutation to generate negative examples fromsoftware. Firstly, they tried automated tools to create the mutations of program and then collectedtheir execution traces. The analysis of generated traces showed the weaknesses of this approach.The main problem is that a mutation might not necessarily change the order of events. Therefore,they derived an approach to conduct mutations directly on positive traces (i.e., event logs). Thisapproach requires domain knowledge or inspection of implementation to derive a set of mutationrules to describe the sequences of event that cannot happen. This approach is not well applicablein our context for several reasons. Firstly, with limited expert knowledge, it is hard to define rulesthat can fully cover the negative examples. Furthermore, code inspection probably requires a lotof effort considering the complexity of code. Moreover, by inspecting code, one can only derivethe rules to describe the sequence of event that cannot be executed in implementation rather thanthe intended interface protocol. Note that for traditionally engineered software, implementationis often not exactly conformed to the intended interface protocol, which make this approachinfeasible.

Since interface protocols are developers’ intention, the solution for this problem can be queryingexperts knowledge to classify the overapproximated behavior. The proposed approach shownin Chapter 5 leverages this idea. We compare the active and passive learning results in termsof behavior. The approach identifies the possibly overapproximated behavior and then queriesexperts for further classification.

4.2.3 Evaluating completeness of logs

The completeness of logs largely influences the results—a very sparse data set would not bringvaluable information. It is determined by the number of different observations (i.e., patterndiversity) rather than its frequency. The learned model will contain only partial behavior if thereare only few patterns available, even the frequency of each pattern is high. For this reason, it isvaluable to check whether the provided logs present sufficient patterns before applying algorithmson it.

Furthermore, the completeness of data is a useful information for making balance between precisionand generalization. We have to admit a fact that logs are inevitably incomplete. The task ofpassive learning is to infer information from finite and limited observations. A certain level ofgeneralization contributes to the part of behavior that were not present but possibly can happenbased on the evidence shown in the log. The learned model badly represents the interface protocolwhen the generalization is, either too high or too low. A learned model with a significantly highdegree of generalization has quite a lot unobserved behavior. A learned model having a poorgeneralization (i.e., high precision) tends to enumerate observations by transitions and states andinfer very little extra behavior. Therefore, if the log is relatively complete, then less generalizationis needed. Inversely, one may expect more inference (i.e., higher generalization) if the log isrelatively sparse.



Therefore, evaluating the completeness of logs is useful for avoiding poor quality of models andguiding users to tune algorithms. Then this leads to the question of how to evaluate the complete-ness of logs. The evaluation is hard since the target interface protocol is hidden from us. Eventhough human knowledge is available in some cases, it is impractical to iterate over all the tracesand give a quantitative conclusion manually.

This problem cannot be solved in an automated way. However, some measurements can helphumans to make judgment. We propose to measure the fitness and precision score of activelearning result with respect to the log to estimate the completeness of the log. A perfect fitnessscore indicates that the log can be correctly aligned on the model. Otherwise it suggests theincompleteness of active learning results or the incorrectness of log preprocessing. Then we needto repair the errors. If the fitness score is perfect but the precision score is significantly low, thisindicates that the pattern of the traces is not diverse (i.e., the traces reflect only several patterns)compared with the patterns shown in the model. As the model represents the implementedbehavior, we can reason that in this case either the log is sparse or the implementation is verypermissive. However, if the fitness is perfect and the precision is very high, this suggests that thelog is relatively complete and the implementation does not have a lot of unused behavior.


Chapter 5

Combining approach

This chapter we propose an approach to combine the learned results obtained from active andpassive leaning techniques. In order to derive the combining approach, we first formulate assump-tions (Section 5.1). Based on that, we analyze the relations between learned models and log interms of behavior. The relations are explained with different scenarios. We formulate three stepsfor this approach. The first step ensures that active learning result includes the behavior observedin the log (Section 5.2). The second step classifies the overapproximation in passively learnedmodel (Section 5.3). Then the overapproximation of actively learned model is classified with theinvolvement of expert knowledge in the third step (Section 5.4). Finally, we discuss a possibleapproach to generate a finite set of representative traces from automata so that the classificationcan be done on those traces (Section 5.5).

5.1 Assumptions

We start with a simple scenario as shown in Figure 5.1. The system consists of two componentsA and B. Component A is using the services provided by component B over the interface IF. Thebehavior of component B is not guarded by parameters.

For active learning, component B is isolated from component A and connected with the learnervia a wrapper. The wrapper maps input and output alphabets to the concrete function calls oninterface IF. The output of L* is a Mealy machine which can be converted into a Labeled transitionsystem (LTS). MAL/IF denotes the actively learned model in terms of an LTS.

For passive learning, we need to collect the communication logs from actual executions in reality.The communication log records the execution of function calls on interface IF. As discussed inChapter 4, preprocessing log correctly is a challenge. The incorrect preprocessing introduces errorsto the learning result. In this chapter we derive an approach under an assumption that the inputlog is correctly preprocessed. The robustness to possible incorrect preprocessing is considered asfuture work. LogIF is the input for passive learning algorithms such as state merging algorithms.The output of state merging algorithms is a DFA. Recall that the sequence accepted by the DFA

Figure 5.1: A simple system scenario


CHAPTER 5. COMBINING APPROACH

is a positive example and any prefix of it is also a positive example. Considering this specificity,the output DFA can be converted to an LTS such that all states are accepting states. MPL/IF

denotes the passively learned model in terms of an LTS.

5.2 Step 1: ensuring log is included in the active learningresult

The log consists of program traces, capturing the behavior of actual executions. The behaviorshown in the log is the partial behavior of MAL/IF under the assumption that the generated testcases are sufficient to detect the differences between the hypothesized model and SUL. Scenario 1shown in Figure 5.2 presents the relations between L(MAL/IF ) and LogIF on this idealistic con-dition. However, this is not always guaranteed especially for a large system [42], as explainedin Section 4.1.2. Scenarios 2 and 3 illustrate the relations between L(MAL/IF ) and LogIF whenLogIF ⊆ L(MAL/IF ) does not hold.

The non-empty set LogIF \MAL/IF in Scenarios 2 and 3 explicitly indicates the incompletenessof the active learning result, as it misses some observed behavior in the execution log. To addressthis problem, we derive an algorithm shown in Algorithm 1 to refine the active learning result sothat it includes the behavior shown in LogIF .

Algorithm 1 Algorithm to ensure that LogIF is included in MAL/IF

1: function EnsureLogInclusion(MAL/IF , LogIF )2: while CheckFitness(LogIF ,MAL/IF ) < 1 do3: tr = ReturnCounterExample(LogIF ,MAL/IF ) . Pick a trace in LogIF which is rejected

by MAL/IF

4: MAL/IF = RefineALResult(MAL/IF , tr) . provide the counterexample to L*

5: return MAL/IF

The inputs for this algorithm are MAL/IF and LogIF . The CheckFitness function returns a fit-ness score for MAL/IF with respect to LogIF . The score ranges between 0 to 1. If the scoreequals to 1, then LogIF ⊆ L(MAL/IF ) holds. Otherwise the inclusion does not hold. FunctionReturnCounterExample picks a trace in LogIF which is rejected by MAL/IF . The trace is con-sidered as a counterexample. Function EnableRefinement enables active learner to refine MAL/IF

with the counterexample. The learner generates membership queries to the SUL based on theprovided counterexample. The active learning process iterates until the new hypothesis is con-formed to the SUL. This algorithm terminates when CheckFitness function returns a perfect fitnessscore. The resulting active learning model is denoted as M ′AL/IF .

After executing this step, the refined active learning result is guaranteed to include LogIF . Theprevious relations between MAL/IF and LogIF shown in Scenario 2 and 3 are changed to therelation shown in Scenario 1 where LogIF ⊆ L(MAL/IF ) holds. Figure 5.3 shows the changedrelations between the language of initial active learning result L(MAL/IF ), the language of refined

Figure 5.2: The relations between MAL/IF and LogIF



Figure 5.3: The relations between L(MAL/IF ), L(M ′AL/IF ) and LogIF

active learning result L(M ′AL/IF ) and LogIF . The provided counterexample can guide the learnerto refine the active learning result. The refinement can probably obtain the missed behavior orrestrict the imprecise behavior. Therefore, the sets L(MAL/IF ) \ L(M ′AL/IF ) and L(M ′AL/IF ) \L(MAL/IF ) can be non-empty.

5.3 Step 2: classification of overapproximation within thepassive learning result

After executing the previous step, MAL/IF includes the behavior shown in the log. In this step, weintend classify the behavior that is included in the passive learning result but not observed in thelog (i.e., MPL/IF \ LogIF ). This behavior is called the overapproximation of the passive learner.The desired end-result of this step is that the overapproximated behavior is either added to theactive learning result, in case the behavior is accepted by the SUL, or removed from the activelearning result in case this behavior is not accepted by the SUL. MPL/IF \MAL/IF representsthe behavior suggested by passive learner but not accepted by MAL/IF . It is possible that somebehavior in this set is missed by active learning due to the insufficient tests. To reduce the riskof missing behavior, we use this set of traces to refine active learning result. Algorithm 2 explainshow this refinement can be done.

Algorithm 2 Algorithm to classify overapproximation within MPL/IF \MAL/IF

1: function ClassifyOverapproximationPL(MAL/IF ,MPL/IF ,MaxNumberOfTrace)2: Traces = GetTraces({t ∈MPL/IF ∧ t /∈MAL/IF},MAL/IF ,MPL/IF ,MaxNumberOfTrace)3: foreach tr ∈ Traces do4: if isAcceptedBySUL(tr) then . tr was not in MAL/IF but it can be accepted by SUL,

therefore it should be included in learned result5: MAL/IF = RefineALResult(MAL/IF , tr)6: return ClassifyOverapproximationPL(MAL/IF ,MPL/IF ,MaxNumberOfTrace − 1 )

7: return MAL/IF

The inputs for this algorithm are MAL/IF (that includes LogIF ), MPL/IF and a non-negativevariable MaxNumberOfTrace. Function GetTraces can return a set of traces that satisfies thespecified condition MPL/IF \MAL/IF for each trace t. The number of the traces satisfying thiscondition can be infinite. Variable MaxNumberOfTrace is used to limit the generation to a finiteset of traces so that the termination of algorithm can be enforced. Moreover, this problem canalso be addressed by using model-based testing algorithms to generate a finite set of traces thatsatisfies condition MPL/IF \MAL/IF . We discuss such algorithms in detail in Section 5.5.

The foreach loop in line 3-6 iterates over each trace. Function isAcceptedBySUL queries the SULwhether the sequence of functions calls in the trace can be executed in the SUL. If the answer is“no”, nothing has to be done with this trace. In case trace tr is accepted by the SUL, the traceis considered as a counterexample that distinguishes between MAL/IF and SUL. e.g. the trace is



Figure 5.4: The relations between L(M ′′AL/IF ), L(M ′AL/IF ), L(MAL/IF ) and LogIF

accepted by the SUL, but the trace is not included in the MAL/IF . Then the active learning resultis refined by function RefineALResult, similar as in Algorithm 1. Next, the algorithm is recursivelycalled on the refined result and passive learning result. The recursion updates the search space ofthe algorithm. The algorithm terminates when either all generated traces are rejected by the SULor variable MaxNumberOfTrace is reduced to zero. The resulting active learning model is denotedby M ′′AL/IF .

Figure 5.4 shows the changed relations. L(MAL/IF ) denotes the scenarios that the log is notincluded in active learning result. L(M ′AL/IF ) denotes the refinement of L(MAL/IF ) after executingstep 1. Similar to the previous step, the refinement in step 2 with counterexample can reducesome permissiveness and probably add more behavior to the result. This means that the setsL(M ′′AL/IF )\L(M ′AL/IF ) and L(M ′AL/IF )\L(M ′′AL/IF ) might not be empty. The behavior present in

L(MPL/IF )∩L(M ′AL/IF ) is generalized by the passive learner. In this step the active learner infers

the behavior present in L(MPL/IF )∩L(M ′′AL/IF )\L(M ′AL/IF ), thus validating some generalizations.The traces invalidated by function isAcceptedBySUL represent invalid generalizations. However,there may exist a portion of uncertain behavior due to the threshold MaxNumberOfTrace.

5.4 Step 3: classification of overapproximation within theactive learning result

After executing the previous steps, MAL/IF includes the behavior shown in the LogIF and thevalid generalization shown in MPL/IF . The interface protocol describes how client is using theSUL over the interface. We use MIF to denote the model that represents the interface protocoland L(MIF ) to denote the language of that model. The brightness of the color showed in Figure5.5 indicates the confidence of each part for being the subset of L(MIF ). LogIF is the part ofL(MIF ) as it was generated from software execution. Passive learners generalized the behaviorfrom LogIF . The behavior present in L(MAL/IF ) ∩ L(MPL/IF )) \ LogIF is valid generalization asit is also discovered by active learning technique. Therefore, we consider that the intersection oftwo learned models (i.e., L(MAL/IF ) ∩ L(MPL/IF )) is very likely to be the part of L(MIF ). Inthe previous step, we identified that the behavior described by L(MPL/IF ) \L(MAL/IF ) is invalidgeneralization and should be excluded from L(MIF ).

The behavior present in L(MAL/IF ) \L(MPL/IF ) is also considered to be promising because of itsexistence in the implementation. However, considering that the traditionally engineered softwareusually contains certain permissiveness, it is less evident than the intersected set. In this stepwe involve domain knowledge to classify the overapproximated behavior present in L(MAL/IF ) \L(MPL/IF ).

The algorithm for this classification is shown in Algorithm 3. The inputs for this algorithmare MAL/IF , MPL/IF and LogIF . Variable Conditions encodes the conditions for trace generation.



Figure 5.5: Confidence in the interface protocol behavioral inclusion

The output of function GetTraces are four sets of traces, namely PTr, UnresolvedTr, InvalidGenTrand NTr. Each set of traces satisfies the corresponding condition in variable Conditions. VariableMaxNumberOfTraces is again used to control the number of generated traces for each set. Thetraces in InvalidGenTr are considered to be negative and thus added to NTr. The traces in LogIFare positive and added to PTr in line 5.

The foreach loop in line 6-11 iterates over each trace in UnresolvedTr. The classification query isposted to experts. Once a trace is classified as negative, it is added to NTr. Otherwise it is addedto PTr. After classifying all the unresolved traces, we input NTr and PTr to the passive learningalgorithm that requires both positive and negative examples (e.g., RPNI, Blue-Fringe and EDSM).It guarantees that the resulting MComb accepts all the identified positive traces and rejects all thenegative traces. Note that such passive learning algorithms generalize behavior from the providedpositive examples and use negative examples to avoid overgeneralization. The produced modelcan contain certain degree of overapproximation because of the insufficient examples provided byfunction GetTraces and variable MaxNumberOfTraces.

Algorithm 3 Algorithm to classify overapproximation within MAL/IF \MPL/IF

1: procedure ClassifyOverapproximationAL(MPL/IF ,MAL/IF , LogIF ,MaxNumberOfTrace)2: Conditions = {t ∈ MAL/IF ∧ t ∈ MPL/IF , t ∈ MAL/IF ∧ t /∈ MPL/IF , t /∈ MAL/IF ∧ t ∈MPL/IF , t /∈MAL/IF ∧ t /∈MPL/IF}

3: (PTr ,UnresolvedTr , InvalidGenTr ,NTr) = GetTraces(Conditions,MAL/IF ,MPL/IF ,MaxNumberOfTrace)4: NTr.add(InvalidGenTr)5: PTr.add(LogIF )6: for each tr in UnresolvedTr do7: U : User Prompt . Ask Experts: Is this trace positive?8: if U==no then9: NTr.add(tr) . Add the trace to negative trace set

10: else11: PTr.add(tr) . Add the trace to positive trace set

12: MComb = PassiveLearningAlgorithm(PTr,NTr) . The algorithm infers models frompositive and negative examples e.g., RPNI, Blue-Fringe and EDSM

13: return MComb

5.5 Generating traces from automata

As mentioned in Section 5.3, the number of traces that satisfies the specified conditions inGetTraces can be infinite. In this section we discuss a possible approach from model-based test-ing domain for implementing function GetTraces to generate finite sets of traces that satisfy thespecified conditions.



5.5.1 Model-based testing

Model-based testing checks the conformance between specifications which are represented by mod-els, and the implementation [41]. The test cases are generated from the specification and testedon the implementation. The generated test cases fall in two categories: those test the presenceof desired behavior, and those test the absence of undesired behavior. The implementation undertesting is a black-box in the sense that one can only check whether it reacts correctly (i.e., producean expected sequence of outputs) to a sequences of inputs. An implementation is considered to becorrect if it reacts to all the test sequences exactly as expected. Otherwise the developer needs tofurther examine the counterexample. The model often contains cyclic structures, representing aninfinite language. In such cases, it is not practical to generate an infinite set of test cases from thespecification and test all of them on the implementation. The solution in this field is to constructa representative set of test cases from the language of the specification.

Walkinshaw et al. [51] leverage a test case generator from model-based testing domain to comparemodels. In his scenario, both specification and implementation are in the form of model. The testcase generator characterize reference model (i.e., specification) with a finite set of traces. Thenwith the subject model (i.e., implementation), the generated traces are categorized into the groupsshown in Table 5.1.

To overcome the infinite language problem and still detect the overlaps and differences betweenmodels, we can adapt Walkinshaw’s work for our purpose. Our scenario has a significant differencecomparing with Walkinshaw’s. In our problem, we do not have the distinction of subject modeland reference model. In the light of this difference, we remove the required distinction by swappingthe role of the input models.

The following section illustrates a test generator called W-Method, its use case for solving ourproblem and the way to interpret the results.

5.5.2 W-Method

W-Method is a model-based test generator which is used to characterize the specification. It isimportant to note that the generated traces by W-Method not only just represent the specificationbut also attempt to distinguish it from its implementation. In the context of model-based testing,implementation is hidden and can contains permissive behavior resulting in extra states. Totackle this issue, W-Method requires one to estimate the maximum number of extra states in theimplementation so that it can generate a set of traces that guarantees to explore every extra statein the implementation. Making estimation can be nontrivial especially when developers know verylittle about the implementation. Fortunately, this is not the case in our scenario where inputs aretwo visible models. The number of extra states is the (absolute) difference of the number of statesin these two visible models.

The test traces t constructed by W-Method is the cross product of three sets of sequences, namelyState Cover Set, Symbol Permutations Set and Characterization Set. Here we informally describethese sets. The formal definition is referred to Walkinshaw’s paper [51]. The State Cover SetC is a prefixed-closed set which contains all the sequences required to explore every state of thespecification from the initial state. This set of sequences is used to reach every state expectedby the specification. However, as explained, the implementation can have hidden states that donot exist in the specification. W-Method is integrated with a parameter k which is the number ofpotential extra states in the implementation. The Symbol Permutations Set is constructed usingthe value of k : {ε ∪ Σ ∪ ... ∪ Σk+1} where Σ is a set of symbols used in the specification. Bymaking cross product of State Cover Set and Symbol Permutations Set, we have test cases inC×{ε∪Σ∪ ...∪Σk+1} which ensures that the test cases exhaustively exam not only the expectedstates but also k extra states. However, this is still can be insufficient as it cannot ensure that theyreach the intended states. It is possible that the state reached by a trace in the implementation is



(a) Specification (b) State machine for im-plementation

Figure 5.6: An example for explaining W-Method

not the intentionally reached state by this trace in the specification. The Characterization Set Wis computed for eliminating this problem. This set of sequence is used to distinguish every pairof states, which makes sure the reached states are intended. By appending this set of sequence,the test traces is constructed as: t = C × ({ε ∪ Σ ∪ ... ∪ Σk+1})×W . It is able to distinguish thedifferences between the specification and implementation.

A simple example (Figure 5.6) is provided to show how W-Method works. Assume that we havea specification (Figure 5.6a) which has two states and shows that events a and b are executed inan alternative way. It has an implementation which is represented by a state machine as shownin Figure 5.6b.

The State Cover Set for this specification is {ε, a} which consists of sequences that can reachstates 1 and 2. The parameter k is the number of extra states in the implementation. Theimplementation has two states as well, therefore, k equals to zero. Firstly, we construct the crossproduct of State Cover Set and Symbol Permutations Set : {ε, a} × {ε ∪ {a, b}} = {ε, a, b, aa, ab}.The model classify the generated traces into to categories: those accepted by the model and thoserejected by the model. These two models have the same classification results for each trace inthis set. Thus, it fails to detect the difference between the specification and the implementation.The input sequence ab reaches the state 2 in the implementation but not the intended state1 in the specification. This illustrates the reason why Characterization Set is essential. TheCharacterization Set that distinguishes states 1 and 2 is either {a} or {b}. Here we select {a}as W set to construct test cases t. By appending W set, we construct the complete test cases:t = {ε, a} × {ε ∪ {a, b}} × {a} = {a, aa, ba, aaa, aba}. The test sequence aba distinguishes thesetwo models as it is classified differently.

5.5.3 Function GetTraces

The inputs for function GetTraces are variable Conditions and MaxNumberOfTraces. VariableMaxNumberOfTraces is used to constrain the number of generated traces. In this section, wediscuss how W-Method can be leveraged for our purpose.

W-Method requires explicit distinction between subject (i.e., implementation) and reference (i.e.,specification). It is important to recall that the test case generation is not independent fromsubject. Instead, the test sequence is generated from reference with respect to subject (i.e.,the value of k still depends on subject). Because of that, the generated sequences guarantee todifferentiate subject and reference. As mentioned, we do not have the concept of subject andreference in our scenario. Therefore, we apply W-Method to characterize the learned behavioralmodels with respect to each other. By doing in this way, we can collect more evidences to showtheir overlaps and differences.

We divided the trace generation into two steps. Firstly, we apply W-Method to generate tracesfrom M1 with respect to M2. In this case, M1 plays the role of the reference model from wheresequences are generated, while M2 is the subject model. In the second step we swap the role of



Classification byM1

Classification by M2

tr ∈ L(M2) tr /∈ L(M2)tr ∈ L(M1) 1 2tr /∈ L(M1) 3 4

Table 5.1: Matrix for trace categories

M1 and M2 and again apply W-Method. Once these two steps are finished, we can categorize allthe generated traces in terms of matrix shown in Table 5.1.

Variable Conditions are used to specify which categories of traces should be returned. As anexample, the traces in category 1 will be returned if the condition tr ∈ L(M1) ∧ tr ∈ L(M2) isgiven.

It is important to know a feature of the generated traces by model-based testing algorithms. Themodel-based testing algorithms iterates over each event in a trace on the subject, and reject thetraces instantly when the deviation occurs in the reference. Therefore, one can easily see whichevent causes the deviation when comparing two models in terms of the matrix. As an example, atrace {Z,X, Y } in category 2 is accepted by M1 and rejected by M2. This means that its prefix{Z,X} can be accepted by M2 but the following event Y causes the rejection. This feature helpsexperts to identify the undesired behavior.

5.5.4 Limitations

W-Method requires the input models to be minimal and deterministic. In our use cases, the inputmodels are active and passive learning result. The active learner L* guarantees that the learnedmodels are minimal and deterministic. For passive learning, the property of results depends on theused algorithms. Most of state merging algorithms satisfy this requirement. The algorithms fromprocess mining do not enforce such property on learned results. Therefore, applying W-Methodlimits our choice for passive learning algorithms.

5.6 Discussion and related works

The approach described in this chapter infers an interface protocol of a component-based softwaresystem by iteratively classifying the behavior shown in active and passive learning results withthe involvement of expert knowledge. To the best of our knowledge, this is the first approach thatcombines active and passive learning results for interface protocol inference. However, there aresome related works that also involve expert knowledge in learning.

The QSM algorithm [11] has been developed for learning a behavior model of the SUL frompositive and negative traces and queries posted to experts. This algorithm is also built on topof the RPNI algorithm. It constructs a PTA from initial positive traces and then queries expertknowledge to validate state merges. We expect two limitations of this approach in the applicationof interface protocol inference. First of all, this algorithm requires that the initial set of positiveexamples satisfying the first condition of the notion Characteristic. This is hard to guaranteein practice because the available trace set is usually incomplete. Secondly, during the processof learning, the experts are asked to classify the traces introduced by the merge. This kind ofquery is difficult to answer as experts might be unfamiliar with the software behavior. Similarly,the algorithm provided by Hammerschmidt et al. [19] allows experts to interactively operate statemerging and/or unmerging on the intermediate results, and expects them to finally produce thetarget state machine by doing those operations. We consider this as a tough task also as expertsusually know very little about the target state machine.



The novelty in our approach is to analyze the relations between active and passive learning res-ults. By providing suggestions from different perspectives (i.e., implemented behavior and usedbehavior), we expect that it can give experts more insights of software and help them to makeclassification. We discuss the possible improvements in future research in Chapter 7.


Chapter 6

Case study: interface protocolinference at ASML

In previous chapters we discuss the proposed interface protocol inference workflow and an approachto combine learned results. This chapter discusses the feasibility of the proposed approaches inASML context. First of all, we studied the traditional and MDE software development paradigmswithin ASML. This study illustrates the differences between these two paradigms and their influ-ence on interface protocols (Section 6.1). Then we selected a component as the SUL (Section 6.2).In the case study, we applied the proposed workflow to the selected SUL. The execution of theworkflow, and its (intermediate) results are explained (Section 6.3). In the end of this chapter, wesummarize the lessons learned from this case study (Section 6.4).

6.1 Interface protocols at ASML

The choice of traditional or MDE-based development impacts the ways interface protocols arespecified. We summarize the understanding of interface protocol in traditional coding paradigmand introduce the modeling of interface protocol in an MDE tool called Analytical Software Design(ASD): Suite [1] which is used within ASML software development teams.

6.1.1 Traditional software development

We have interviewed three ASML software architects during our preparation study [33], discussingthe development process of traditional software engineering paradigm.

6.1.1.1 Development process

Figure 6.1 shows the software development process within ASML. When a software component isinitially designed, developers create several documents before starting implementation. First ofall, the document Software specification is created, specifying the services that this componentprovides. After that, the Test specification details out all tests for the component. After finishingthese two documents, engineers who work on the components interfacing with this component willreview the documents and check whether the services are correctly specified and the test suite issufficiently complete or not. Then the documents are updated according to reviewers’ feedback.Once they achieve agreement on all the designs and specifications, developers conduct the imple-mentation both for the test suite and software component. Before coding, the Interface definition


CHAPTER 6. CASE STUDY: INTERFACE PROTOCOL INFERENCE AT ASML

Figure 6.1: Traditional development process

files are created, which specifies the methods of each interface. After finishing implementation,the testing is conducted.

There are three main categories of testing conducted within ASML, namely unit testing, boundarytesting and system testing. Unit testing and the most part of boundary testing are conductedwithin the component development group while system testing is handled by other integrationgroups. After running the created test suite, ASML automated testing framework produces aTesting result Report that provides the outcome for each test. Based on analysis of results,developers either go back to the implementation phase for fixing bugs or finish the developmentand deliver it to the integration group. Even though software is delivered, the development isnot simply ended. Sometimes software is shipped back because of testing failures on system level,which requires modification and regression testing.

When refactoring, developers only update specification, and run progression and regression testcases after modification. Obviously, when the functional modification is conducted, all the de-velopment activities need to be repeated on a smaller scale, which requires engineers to updateall the documents, generate extra test cases for new functionalities and test all the cases aftermodification. In case that code is too complex to add functionality, the code is refactored. Toensure that refactoring does not introduce bugs, regression testing should be conducted beforefurther modification. However, it is possible that such ideal development process is not strictly



(a) Interface protocol (b) Implementation

Figure 6.2: Non-conformity between interface protocol and implementation

followed. For example, refactoring and adding new functionality are combined without conduct-ing regression testing in between. This is a common risk appearing in many companies [20], asdiscussed in interviews conducted at Microsoft [26].

Moreover, the correctness of software behavior is not formally verified. Therefore, it is very likelythat the implementation did not exactly describe what has been specified. This non-conformitymight exist in the system and cause failures during evolution. Figures 5.1 and 6.2 together illustratethis scenario.

Assume that there is a simple system (Figure 5.1) in which component A uses the services providedby component B via interface I. The component B is expected to be used according to its interfaceprotocol(Figure 6.2a) which calls functions α and β in an alternative manner. However, theimplementation of component B (Figure 6.2b) allows, on initial state 1, to execute functions α andβ randomly, and functions f and g alternatively. The loose implementation of component B cannotbe found if the test cases did not cover it. The whole system can function properly if component Astrictly uses component B as intended. In fact, as individual components are evolving, along withthe whole system, there is no guarantee that all the components will always preserve their behavioras intended (i.e., poor predictability). For example, assume that function a can trigger a physicalaction of the hardware controlled by component B, and component A somehow mistakenly callsan illegal function sequence aa. In this case, the permissive implementation of component B willallow the sequential execution of function a and thus leads to a physical failure. For high-techcompanies, such as ASML, which are expected to produce machines functioning precisely, thispotential failure is a risk.

This scenario explains why ASML has shifted its preference toward MDE such that the behavior ofsoftware can be precisely specified and formally verified. Furthermore, this scenario also suggeststhat applying active learning techniques alone can fail in inferring interface protocols since thistype of technique only learns behavior which was implemented rather than actually intended.

6.1.1.2 Software artifacts

The software artifacts of an ASML component has been studied. The main goal of this activityis to discover what domain knowledge can be extracted from those artifacts and to which extentthe effort is needed. The extracted domain knowledge should provide compulsory information,and probably also assistance, for model inference. For those software components, the softwareartifacts were created according to templates that are used across software developments teams.Different teams might adapt the templates to their needs but the main contents in each type ofsoftware artifacts should be consistent.



Software specification

This document provides the system context of the component with a UML use case diagram whereeach client is denoted as an actor. The main functionalities and use case scenarios are described innatural language. Interfaces are defined as sets of function calls which characterize the behavior ofa functionality. Since this document is created at the very initial stage of the development process,it carries the knowledge that developers have at that time. The later changes and evolution ofsoftware might not be necessarily reflected in the documents.

Testing specification

This document describes test cases using natural language. For each test case, the importance islabeled as either basic or advanced. As observed, basic test cases check the correct execution ofoperations while advance test cases examine whether software behaves correctly when it receivesabnormal requests. From this document, a set of functional requirements can be extracted, whichcan be used to validate or refine inferred models.

Interface definition file

As explained in section 6.1.1.1, this type of files specify all the functions used in each interface. Theskeleton code of the software is automatically generated from it. Then developers can implementthe body and logics of the software. For each function, the in-out parameters, return type andgeneration option are specified. To explicitly define how functions should be executed and howtheir skeleton should be generated, ASML software teams have standardized several generationoptions. For example, one of generation option called nonblocking specifies that client is notblocked when it calls this function. With this option specified, not only the skeleton of requestfunction but also wait function will be generated. During execution, the client calls functionrequest and then calls function wait to obtain results after a while. Since this type of files isproprietary, the precise function signatures need to be extracted with the semantic knowledge.

6.1.2 Model-driven engineering

Due to the “formal” nature of MDE, the interface protocol in this paradigm is explicit and precise.ASML is using ASD [1] as the MDE tool which allows specification to be formulated by (a kindof) Mealy machines. Then the specification can be formally verified and corresponding code canbe generated.

Using ASD, the specifications and components are described by Mealy machines. ASD adoptsSequence-Based Specification Method [36] which forces developers to consider all the possibletransitions on each states, ensuring the specification is complete. For each component, two typesof models are specified, namely Interface model and Design model. An interface model specifies thevisible behavior (i.e., interface protocol) to the component’s clients. The design model implementsthe interactions with its clients and all other components it uses. ASD components can also useservices from other components which can be either modeled as ASD components as well, or mightbe developed using traditional software engineering methods. These latter components are calledforeign components which can be such as third party components, legacy code or hand writtencode. Foreign components should have have an interface model which formalizes and capturestheir external behavior and that allows them interact with the ASD components properly.



Figure 6.3: Model-Driven Engineering

Figure 6.4: ASD example: alarm system

6.1.2.1 Development Process

Figure 6.3 shows the development process using MDE. Here we take the component AlarmSystemas an example to explain the development process. The component AlarmSystem interacts withcomponents Siren and Sensor as shown in (Figure 6.4). The services used by AlarmSystem arespecified in the interface models of Siren and Sensor respectively. In this case, Siren and Sensorare foreign components therefore do not have design models.

In the design phase, the interface and design models for component AlarmSystem are specified.After finishing the design, the correctness of behavior is formally checked. First of all, a modelchecker checks whether all the interface models (i.e., AlarmSystem.im, Siren.im and Sensor.im asshown in Figure 6.4) are free from deadlocks and livelocks. Then it verifies whether the designmodel of AlarmSystem properly uses the interface models of Sensor and Siren. Finally, it exam-ines that the AlarmSystem’s design model, along with the interface models of Sensor and Sirencorrectly implements the behavior specified in the interface model of AlarmSystem. If the defectsare discovered during checking, developers have to repair the design. When these are all correct,the source code can be automatically generated from the interface and design models of AlarmSys-tem. After that, the integration and test with other foreign components (i.e., Sensor and Siren)are conducted. Iterations are required when errors occur in this phase.

In conclusion, the interface protocol in the software developed with ASD tool is explicitly describedby a Mealy machine and holds a formal relation with its design model. The source code is generatedfrom design models, preserving the behavior that has been specified.



6.1.3 Conclusion

In the traditional development process, the knowledge of interface protocols is distributed amongmultiple documents. The natural language description in those documents might be ambiguous,and incomplete. There is no (formally mathematical proven) guarantee that the main softwareartifact, the source code, obeys the intended behavior.

Compared with the traditional development process (Figure 6.1), Model-based development pro-cess (Figure 6.3) shows at least three strengths. First, the behavioral correctness of software isverified at the earlier stage of the development process. The feedback loop between verificationand design is shorter as the generation of source code is not compulsory for verification. Theformal verification techniques guarantee that the external behavior of software is preserved duringevolution. Second, the code transformation is hidden, which allows developers to focus on thedesign of software, instead of developing and repairing bugs at source code level. Finally, theinterface protocols are explicitly defined by means of interface models. This paradigm provides amechanism to force developers to precisely specify the software behavior and directly update thedesign when they conduct modifications.

The strengths of MDE motivate the shift from code toward models. For traditionally engineeredsoftware, a lot of effort and expert resources have been invested in developing such well functioningsoftware, thus, simply discarding it is not a cost-effective option. Extracting knowledge from themis practically valuable for reducing the required workload to construct models. However, thepotential permissive implementation of them also suggests that what has been implemented maynot be what was desired. This confirms the idea that simply applying active or passive learningtechniques cannot learn the intended behavior, and as such, domain expert knowledge injection isindispensable.

By studying the software artifacts of the traditional software, we identified that a complete setof method signatures on interfaces can be easily extracted from the interface definitions file withsemantic knowledge. This information is used for constructing input alphabets for active learningand filtering event logs for passive learning.

6.2 Case selection and goal

In Chapter 3 we provided a high-level overview of the workflow for interface protocol inference.The workflow is expected to be generic and can be applied in other contexts. We describe how theproposed workflow was executed for a concrete case study. The goal of this study is to identify theconcrete steps for learning interface protocols within the ASML context. We study a componenttaken from TWINSCAN machine called LOPW wtc which was developed with ASD.

The workflow is proposed for inferring behavioral models from the traditionally developed software.We are aware that the best candidate for validating the correctness of the proposed approach isthe traditionally developed software. However, due to several limitations, in this project we applythe workflow to an ASD component. Both active and passive learning techniques have difficultiesin learning data-flow and nondeterministic behavior. It could take a lot time to find a learnablesoftware component as the determinism is not guaranteed in the traditional development process.It is possible to encounter nondeterminism while learning, which could hinder our study. Further-more, our combining approach requires expert knowledge to classify the unresolved behavior. Thelearned result has to be iteratively discussed with experts to check whether it conforms to theirintentions. For such intensive queries, a well-planned schedule is needed to minimize the influenceon those experts’ regular work.

For the reasons above, we apply the workflow to an ASD component for identifying the concreteactivities, and discuss the expected differences for learning traditionally developed components.



Figure 6.5: The context of LOPW wtc component

We simulate the role of experts and expect this study to serve as a guideline for inferring interfaceprotocols from the traditionally developed components in future.

6.2.1 The SUL

In this section we introduce the selected SUL LOPW wtc component. This component is usedto control wafer table cleaning process. It is a subcomponent of LOPW component which isresponsible for the high-level control of lot and wafer processing.

Figure 6.5 shows the system context of LOPW wtc component. This component provides servicesto its clients via interface p and uses services provided by two components via interface u1 andu2. Our learning scope is to obtain the interface protocol describing how LOPW wtc can be usedvia the interface p.

6.3 Execution

Figure 6.6 gives an overview of the workflow executed within this study. It shows the activities,inputs and outputs of each activity and the used tools.

6.3.1 Preparation

In the first phase, we prepared the inputs for learning. The tasks include extracting functionsignatures and querying the event log.

6.3.1.1 Extracting function signature

The function signatures are explicitly defined in ASD model files. We obtained function signaturesfrom ASD models of LOPW wtc. We use Fp, Fu1 and Fu2 to denote the function signaturesextracted from the interfaces p, u1 and u2 respectively. We expect that this activity will beexecuted differently when dealing with the traditionally engineered software. As discussed inSection 6.1.1.2, the interface definition files contain generic function signatures for all interfacesof a component. The semantic knowledge is required to extract concrete function signatures fromthe files. However, the extraction can be easily automated.



Figure 6.6: An overview of the workflow execution for the ASML case study



6.3.1.2 Querying event log

There are two “sources‘’ of logs in ASML. Production log is recorded from the execution ofmachines in a production environment. Testing log is generated from the execution of test cases ina simulation environment. Due to the constrained time for this project, we only obtained the logsgenerated from the execution of the so-called auto-tester for LOPW component. This auto-testeris a unit testing suite of the LOPW component. It is expected to check all the external behavior ofthe LOPW component. For each test case, a log was generated. We obtained 1436 logs in total. Asalready mentioned, LOPW wtc component is a subcomponent of LOPW component. The obtainedevent log recorded how LOPW wtc component was interacting with other subcomponents insideLOPW component during testing.

We are aware of the fact that test cases can be very different from real use cases. For instance,some tested behavior (corner cases) might never be used/observed during production. However,we argue that test execution logs are valuable as they reflect which behavior developers care aboutmost and thus intend to guarantee. Comparing the models learned from different sources of logsmight bring different insights in the usage of the interface protocol.

For each event in the logs, there are six attributes recorded namely component, timestamp, sub-component, lifecycle, interface, and function. The attribute component gives the component namewhich is LOPW for all events in this case. The occurrence time of each event was encoded inthe attribute timestamp. The attribute subcomponent indicates which subcomponent the eventbelongs to. For each function call, the start and completion are logged separately. The attrib-ute lifecycle indicates the start or completion of that function call. The name of interfaces andfunction calls are encoded in the attribute interface and function respectively.

6.3.2 Learning

After executing the preparation activities, we obtained three function call signature sets Fp, Fu1

and Fu2, and the test execution logs. With these inputs, we started the learning phase.

Establishing abstraction

The behavior of LOPW wtc component is not guarded by data. Hence, we established the abstrac-tion on individual function call level. This means we mapped each input alphabet to a concretefunction call implemented in LOPW wtc component. The complete set of alphabets Σ consists ofthe alphabets mapped to the concrete functions in Fp, Fu1 and Fu2. Σin denotes the input alpha-bets which are used by active learning algorithms to formulate queries. IF denotes the alphabetsmapped to the concrete functions in Fp.

Active learning

Connecting learner to SUL. We create a wrapper that maps each alphabet in Σin to the concretefunction call on LOPW wtc component. The wrapper is also responsible for capturing the sequenceof outputs from LOPW wtc component. The implementation of this wrapper is facilitated by ASD,as it provides stub code generation for interface models. We utilized this functionality of ASD togenerate the stub for the interfaces surrounding LOPW wtc component, and then implemented aglobal queue to capture the sequence of output actions. During learning, the learner receives theoutput sequence from the wrapper for each corresponding trigger. We expect that implementingsuch a wrapper is also convenient for the traditionally engineered software. The interface definitionfiles specify all the functions used in interfaces, and the files are used to automatically generateskeleton code of the software.



Furthermore, the online interaction mechanism of active learning technique requires the SUL tobe reactive even when illegal actions are triggered. By default, the ASD components immediatelyterminate when they diagnose an attempted trigger of illegal actions. In such case, the learnerwould not get any responses from the SUL, thus causing learning failure. Our solution for thisparticular case is modifying the default diagnostic handler to throw an exception which can notifythe learner. Our experience indicates that for learning the traditionally engineered components,checking how they are handling illegal triggers is needed.

Learning behavioral model. In this study, we used L* to actively learn the LOPW wtc component.We select random walk testing method for the active learner to check the equivalence between thehypothesis and SUL. The choice is made because LOPW wtc component is relatively simple andthus does not require many test cases to detect differences between the hypothesis and SUL.

Projecting model on interface. We conducted the task of projection in mCRL2 which providestools to model, validate and verify systems and protocols. The output of L* is a DOT file whichtextually specifies the learned Mealy machines. Before importing the model to the mCRL2 tool,we used a custom Python script to transform the learned Mealy machine to a labeled transitionsystem in DOT file. Then the DOT file is further converted to LTS file format. Having a modelin LTS file format allows us to perform renaming using mCRL2 tool. Our learning scope is theinterface protocol describing how LOPW wtc component is being used over interface p. The actionsspecified on interfaces u1 and u2 are invisible to the interface protocol. Therefore, we renamedall the transitions in Σ \ IF to τ . In order to hide the internal actions from model comparisons,we further reduced the model by performing weak trace reduction. This operation removes all τtransitions shown in the original model, and preserves weak trace equivalence between the originalmodel and the reduced model.

After executing those steps, we obtained the active learning result MAL/IF which represents theimplemented behavior on interface p.

Passive learning

Preprocessing logs. We parsed all information in the logs and converted it into a comma-separatedvalues (CSV) file with a custom Java script. Then the log is filtered by attribute function. We keptonly the events that are associated to the function included in IF . Then the events representingthe start and completion of a function call are merged into one event with the completion time asthe timestamp. In this case study we did not need to segment the log as those traces are generatedper test case which are clearly separated from each other. But we expect that log segmentation isrequired for preprocessing production logs. The final activity is to convert the log from CSV fileformat to extensible event stream (XES) file format so that we can use plug-ins provided in ProMto analyze the log.

Analyzing logs. ProM provides a plug-in Dotted Chart which shows an overview of the distributionof events over time. The dotted chart for the preprocessed log are shown in Figure 6.7. Weconfigured the horizontal axis as time and the vertical axis as event index in the trace. Fromthe dotted chart, it seems that all traces start with event initialize and ends up with eventstop production cb. It can also be observed that many repeat patterns exist in the log.

We have four observations from analyzing log summary. Firstly, the function error occurred isincluded in IF but never observed in the log, which implies that the tester does not cover therelated functionality. Secondly, the statistics show that all traces start with event Initialize.There are 99.443% of traces ending up with event stop production cb while 0.557% ending up withinitialize. These minority traces have a same pattern which consists only one event initialize. Welooked into the original logs, and observed that all subcomponents are initialized at the beginning ofeach test case execution. For these minority traces, we speculated that LOPW wtc component wasinitialized but not used in these corresponding test cases. Taking into account the low occurrence



Figure 6.7: Dotted chart for the input logs

Min Mean MaxEvents per trace 3 10 15Event classes per trace 3 4 5

EventPerTraceEventClassesPerTrace 1 2.5 3

Table 6.1: A summary of the length of the shown patterns

of this pattern, we filtered out these traces. The remaining log consists of 20 patterns. The mostfrequent pattern is {initialize, stop production, stop production cb} which appears 945 times. Thefour most frequent patterns occur 1359 times in total and take up 95.79 % of the traces in the log.There are six patterns only appearing once in the log. These statistics indicate which behaviorwas tested (or used) more frequently. But it may also suggest that the test cases are biased towardseveral particular patterns. Finally, Table 6.1 summarizes the length of these 20 patterns in thelog. On average, a trace consists of four event classes and each of them occurs 2.5 times in thetrace. This implies that the learning result may have some small cyclic structures.

We planned to use Disco to learn the process model. However, without a commercial license, Discodoes not support processing logs that contain more than 100 traces. Hence, in this case study wedid not execute this activity. The final output for this step is the input log LogIF which consistsof 20 unique patterns.

Learning behavioral models. We used K-tails algorithm to learn the behavioral model. Thisalgorithm only uses positive examples. First of all, the algorithm uses positive traces to build upa PTA where two strings reach the same state if they share the same prefix until they reach thestate. The PTA precisely encodes the behavior shown in the log. Then it starts merging frominitial state. In each iteration it merges a pair of states that have identical suffixes of length k.The greater value of k the fewer merges can be done, resulting in overfitting. We now discuss howto decide the value of k. As indicated in Table 6.1, the trace in the input log usually containsshort repeating patterns. If the value of k is too high, then the algorithm will only merge thestates that have long identical suffixes and thus ignore those short repeating patterns. Based onthis observation, we first tried small values of k. We obtained two learned models, MPL/IF (k=1)

for k = 1 and MPL/IF (k=2) for k = 2. In the next step we analyzed the learning results withmeasurement techniques to evaluate which model is better and whether the value of k should beincreased or not.



Figure 6.8: Fitness and precision measurement for MPL/IF (k=1)(left), MPL/IF (k=2)(middle) andMAL/IF (right)

6.3.2.1 Quality control

Measuring fitness and precision. In ProM, we used a plug-in called Multi-perspective ProcessExplorer to measure fitness and precision for the learned models MAL/IF , MPL/IF (k=1) andMPL/IF (k=2) with respect to LogIF . The fitness and precision are measured based on the conceptof alignment. Computing the alignment of a trace requires to do pairwise matching between eachevent in the trace and the event allowed by the model. We refer detailed explanation of alignmentto van der Aalst et al.[44].

The Multi-perspective Process Explorer plug-in requires models in Petri net. Therefore, we firstapplied the plug-in Convert to Petri Net to convert the learned models from LTS to Petri net.Figure 6.8 shows the measurement results. It shows that the learned models have perfect fitness,which means the active learning steps and the log preprocessing were correctly executed. Further-more, this also suggests that LogIF is completely included in MAL/IF and MPL/IF . There is noneed to execute the foreach loop in the step 1 as described in Algorithm 1.

Regarding the precision measurement, we first discuss the results for MPL/IF (k=1) (left in Figure6.8) and MPL/IF (k=2) (middle in Figure 6.8). The precision score for MPL/IF (k=2) is 94.9%,which means MPL/IF (k=2) contains very little extra behavior besides the observed behavior inLogIF . Based on this result, we argue that there is no need to further increase the value of k asthe greater value of k results in more overfitted model. In contrast, MPL/IF (k=1) scores 85.6%,which implies that some generalizations are shown in the model. The precision scores for thesetwo models indicate that 1 is the optimal value for k. For this reason, MPL/IF (k=1) was chosento be the passively learned model MPL/IF .

The precision score for MAL/IF is 60.1%. The result suggests that MAL/IF contains more extrabehavior than MPL/IF , with respect to the behavior shown in LogIF . As highlighted in Figure6.8, these states in red rectangles 1,2 and 3 are less precise (i.e., allows to execute more activities



Classification byMAL/IF

Classification by MPL/IF

seq ∈ L(S) seq /∈ L(S)seq ∈ L(R) 9 12seq /∈ L(R) 0 163

Table 6.2: The number of sequences for each category in matrix

Figure 6.9: The relation between MPL/IF and MAL/IF

than recorded in the log) than other states. Several events are in gray because they were notaligned to LogIF . As learned from the log summary, the event class error occurred never occurs inLogIF . From the state in red rectangular 1, the event halt can be executed to reach the state in redrectangle 3. However, the self loop event halt on this state is in gray, which implies that the eventclass halt never occur multiple times in sequence in LogIF . In general, this measured result showsthat, on one hand, LogIF is incomplete in a sense that it misses the behavior associated with theevent class error occurred but on the other hand MAL/IF may contain a certain permissivenessas it allows some unobserved behavior.

6.3.3 Combining

After finishing the learning phase, we obtained the learned models MAL/IF and MPL/IF which arethe inputs for the combining phase. This phase is executed according to the combining approachexplained in Chapter 5. The fitness score for MAL/IF suggests that the step 1 of the approach(Algorithm 1 in Section 5.2) is not needed. In this case study, we use W-Method to implementfunction GetTraces for Algorithm 2 (Section 5.3) and 3 (Section 5.4).

6.3.3.1 Comparing behavioral models

This step aims at detecting the different and overlap behavior of MAL/IF and MPL/IF . First ofall, we applied W-Method to the learned models. As explained in Section 5.5.2, W-Method isparametrized by the maximum difference between the number of states shown in subject modeland the same shown in reference model. MPL/IF and MAL/IF have 6 and 5 states respectively.Therefore, this parameter was configured as 1. We obtained two finite sets of traces to characterizeMPL/IF and MAL/IF respectively. Then the generated traces were categorized into four groupsaccording to the matrix shown in Table 5.1. Table 6.2 shows the number of traces for each category.9 traces are positive and added to PTr. 163 traces are negative and added to NTr. There are12 traces in UnresolvedTr. In addition, there is no trace accepted by MPL/IF but rejected byMAL/IF . This results in a special scenario where MPL/IF and MAL/IF holds the “inclusion”relation as shown in Figure 6.9. This also suggests that there is no extra overapproximationlearned by the passive learner. Therefore, the step 2 of the approach (Algorithm 2) is not neededas well.



Unresolved traces1. initialize, clean wtc, stop production2. initialize, clean wtc, halt3. initialize, halt, halt4. initialize, clean wtc, clean wtc cb, halt5. initialize, clean wtc, clean wtc cb, clean wtc, stop production6. initialize, clean wtc, clean wtc cb, clean wtc, halt7. initialize, clean wtc, clean wtc cb, clean wtc, clean wtc cb, halt8. initialize, error occurred9. initialize,clean wtc, clean wtc cb, error occurred10. initialize, clean wtc, clean wtc cb, clean wtc, error occurred11. initialize, clean wtc, error occurred12. initialize, stop production, stop production cb, initialize, error occurred

Table 6.3: Unresolved traces

6.3.3.2 Classifying discrepancies

Table 6.3 lists the unresolved traces. Since we do not intend to validate the inferred results withthis case study, the role of experts is simulated by us. As an example, we classified traces 1 and 5as negative (i.e., event clean wtc is not allowed to be directly followed by event stop production).Then these two traces are added to NTr, leaving 10 traces in UnresolvedTr . We further classifiedthe remaining traces in UnresolvedTr as positive and added them to PTr.

6.3.3.3 Combining behavior

Once the classification is done, we input NTr and PTr to the RPNI algorithm which learns PTAwith PTr and then iteratively merges states that do not violate NTr (i.e, the result of mergesshould still reject all the traces in NTr). This algorithm guarantees to produce a model thataccepts all traces in Pseq and rejects all traces in NTr. This model is a combined result basedon classifications. Figure 6.10 shows the model and the measured precision score. Comparedwith MAL/IF , this model does not allow to execute event stop production directly after executingclean wtc. Hence, its precision score is 61.8% and slightly higher than the precision of MAL/IF .However, the model still contains much more behavior than observations in the log. The precisionwill be increased if more negative behavior is identified and removed.



Figure 6.10: The precision for the combined model

6.4 Discussion

In this section we discuss the lessons learned from this case study and possible improvements.

We identified the concrete activities for learning interface protocol from ASD components. Weexpect that the shown results can be a practical guidance for future practice.

We learned that connecting the SUL to the active learner requires domain knowledge as well. Forinstance, it requires us to check how the SUL reacts to illegal triggers and whether the reaction canbe captured by the learner or not. It can be labour-intensive if manual connection is compulsory.This suggests the need of investigating into the ASML traditionally developed software to estimatethe required effort to connect the learner to the SUL and evaluate the possibility of automaton.

Our case study also shows that analyzing logs is an essential task. It gives a comprehensivereport of the input data and provides detailed insights of components in terms of behavior. Theanalysis may enable further log preprocessing activities. Moreover, the analysis also aid developersin making decisions. As illustrated in this study, the statistic summary of traces helped us toconfigure and tune the parameter of the K-tail algorithm.

We involved considerable human judgment for evaluating the quality of active and passive learningresults. As shown in this study , we discarded MPL/IF (K=2) as it has very high precision score.However, it can be hard for practitioners who have no basic understanding of measurement tech-niques or domain knowledge to make judgment. To address this problem, we suggest to formalizequantitative criteria for the qualified models.

The proposed workflow attempts to assembly different techniques for interface protocol inference.Different techniques are supported by different tools. A practical challenge we encountered inthis case study is that switching tasks between different tools is time-consuming and laborious.The output from one tool usually requires format conversion before importing to other tools. Thesolution for this problem would be developing a framework to integrate all the tools and automatethe file conversions.

Finally, as we have clarified, this case study is not suitable for evaluating the correctness andaccuracy of the combined result. We suggests to validate the feasibility of the proposed approachon the traditionally developed software with expert knowledge involved.


Chapter 7

Conclusion and future research

7.1 Conclusion

In this project we investigated an application of model inference techniques to learn interfaceprotocol inference from component-based software. Active learning technique exhaustively queriesthe SUL in an isolated setup and thus learns a model describing the implemented behavior. Passivelearning technique learns the partial interactions between the SUL and its environment fromincomplete communication logs. We were motivated by the complementary nature of the learnedresults from these two techniques and propose an approach that incorporates domain knowledgeto derive combining results.

For addressing the first research statement, the challenges of active and passive learning techniquesfor interface protocol inference were studied. The identified challenges include: 1) establishing aproper joint abstraction, 2) ensuring the completeness of active learning results, 3) ensuring thecorrectness of segmented logs, 4) generating negative examples, and 5) evaluating the completenessof logs. We provided suggestions on solutions for 1) and 3). The solutions for 2) and 4) arepresented in the combining approach while the solution for 5) is reflected in the quality controlsession of the workflow.

For addressing the second research statement, we proposed a workflow which illustrates the stepsfor interface protocol inference. The workflow presents three phases of inference, namely pre-paration, learning and combining. We leveraged the log analysis approaches and model qualitymeasurement techniques from process mining domain to provide descriptive analysis of passivelyand actively learned models. The activities in combining phase reflect the proposed combiningapproach. We analyzed the relations between learned models and logs with different scenarios.The proposed approach consists of three steps. The first step guarantees that active learningresults include the behavior shown in the log. The second step identifies the valid overapprox-imation inferred by the passive learner. Finally, the expert knowledge is involved in classifyingoverapproximation learned by the active learner. The resulting model is produced based on allclassification results.

We applied the workflow to an ASML (MDE-developed) component. The case study presentsthe concrete activities for each step. It shows that the proposed workflow is feasible for modelinference from MDE-based components. It also presents that the analysis and measurement tech-niques can provide insights of software components and thus can help experts to make decisions.We consider this case study as a guideline for applying the approach to traditionally engineeredsoftware components.

Our work gives an overview of challenges for applying learning techniques for interface protocol


CHAPTER 7. CONCLUSION AND FUTURE RESEARCH

inference, and suggests a possible way to compare and combine different learning results. It canserve as the initial exploration for future interface protocol inference research.

7.2 Future research

There are many open questions in this field. This section discusses the research ideas that mightbe valuable for future work.

7.2.1 Learning data-flow

In this project we constrained our solutions to a simple class of systems which do not contain dataguarded behavior. For extending inference to a larger class of systems, it is essential to learn howdata influences software behavior. Learning such behavior is a big challenge for both active andpassive learning techniques. It might require us to open the black-box (i.e., software component)and explore white-box techniques to infer data perspective of software behavior. There are someexisting academic prototypes which propose hybrid learning techniques to learn data-flow. Howaret al. [21] use symbolic analysis to iteratively refine active learning result. Walkinshaw et al.[53]leverage data mining techniques to learn data-flow on top of the control-flow from executionlogs. One of future work can be evaluating the feasibility of those existing hybrid techniques anddeveloping hybrid solutions to address the problem.

7.2.2 Evaluating passive learning algorithms

Some algorithms outperform others when dealing with long traces while some have superiorityover traces that have a larger alphabet set. In order to provide a guideline for selecting passivelearning algorithms, experimental studies could be done to evaluate the performance of algorithmsfor coping with different features of input logs. Furthermore, action research is also a promisingway to evaluate algorithms for a specific application or context (e.g., learning ASML softwarecomponents). As an example, Wieman et al. [55] conduct an action research in a payment com-pany. This study intensively involves the on-site experts to iteratively discusses the effectivenessand improvement of three passive learning algorithms.

7.2.3 Mining property instances

Our current combining approach requires expert knowledge to classify the overapproximationslearned by active and passive learning techniques. The classification is done on the representativetraces. Once a trace is classified as negative, the resulting model will reject it. This approachprovides concrete questions (acceptance of traces) to experts but the workload can be a lot,depending on the number of generated traces. Alternatively, we can construct property instancesin terms of automata and then compose models using automaton operations (e.g., difference,intersection, union). Beschastnikh et al. [6] propose an approach to mine property instances fromexecution traces (positive examples) and then compose all instances using automata intersectionoperation. We can leverage this idea for our purpose. For instance, if experts identify that eventa cannot be directly followed by event b, then the automaton representing language Σ∗(ab)Σ∗ canbe subtracted from the overapproximated model. However, deriving such properties from scratchis difficult. The solution can be to provide over-approximated behavior to the experts in terms ofrepresentative traces and then guiding them to generalize property instances from them.


CHAPTER 7. CONCLUSION AND FUTURE RESEARCH

7.2.4 Combining techniques in learning process

The application of active learning techniques on a large scale system is hindered by the largeamount of required membership queries. The execution logs may already contain the answer forsome queries and thus can serve as a local oracle which attempts to answer queries generated byactive learner at first place. The costly queries to systems are required only when they cannot beresolved by this local oracle. In this way, the queries to systems are expected to be reduced.

7.2.5 Integrating existing domain knowledge

Our approach involved experts in the loop so that the classification can be done online. How-ever, some domain knowledge might be available in the software artifacts (e.g., documentation).Integrating such existing domain knowledge to the learning process may reduce the workload ofexperts. There are several possible ways to incorporate domain knowledge. First, we could for-mulate available software requirements by means of property instances and compose them withlearned models. Second, the requirements could be formalized as Linear Temporal Logic (LTL)which can validate the consistency between learned models and domain knowledge.


Bibliography

[1] Verum homepage. 31, 34

[2] Fides Aarts, Faranak Heidarian, Harco Kuppens, Petur Olsen, and Frits Vaandrager. Auto-mata learning through counterexample guided abstraction refinement. In International Sym-posium on Formal Methods, pages 10–27. Springer, 2012. 17

[3] Fides Aarts, Bengt Jonsson, and Johan Uijen. Generating models of infinite-state commu-nication protocols using regular inference with abstraction. In IFIP International Conferenceon Testing Software and Systems, pages 188–204. Springer, 2010. 17

[4] Marco Almeida, Nelma Moreira, and Rogerio Reis. Testing the equivalence of regular lan-guages. 5

[5] Dana Angluin. Learning regular sets from queries and counterexamples. Information andComputation, 75(2):87–106, 1987. 4, 5

[6] Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, Michael D Ernst, and Arvind Krish-namurthy. Unifying fsm-inference algorithms through declarative specification. In SoftwareEngineering (ICSE), 2013 35th International Conference on, pages 252–261. IEEE, 2013. 47

[7] Alan W. Biermann and Jerome A. Feldman. On the synthesis of finite-state machines fromsamples of their behavior. IEEE transactions on Computers, 100(6):592–597, 1972. 6

[8] Tsun S. Chow. Testing software design modeled by finite-state machines. IEEE transactionson software engineering, (3):178–187, 1978. 5

[9] Jonathan E. Cook and Alexander L. Wolf. Discovering models of software processes fromevent-based data. ACM Transactions on Software Engineering and Methodology, 7(3):215–249, 1998. 4

[10] Massimiliano De Leoni, Wil M.P. van der Aalst, and Marcus Dees. A general process miningframework for correlating, predicting and clustering dynamic behavior based on event logs.Information Systems, 56:235–257, 2016. 6

[11] Pierre Dupont, Bernard Lambeau, Christophe Damas, and Axel van Lamsweerde. The QSMalgorithm and its application to software behavior model induction. Applied Artificial Intel-ligence, 22(1-2):77–115, 2008. 6, 9, 29

[12] William Durand. Automated test generation for production systems with a model-based testingapproach. PhD thesis, Universite Blaise Pascal-Clermont-Ferrand II, 2016. 19

[13] Michael D. Ernst. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSEWorkshop on Dynamic Analysis, pages 24–27, 2003. 2

[14] E. Mark Gold. Language identification in the limit. Information and control, 10(5):447–474,1967. 4


BIBLIOGRAPHY

[15] Christian W. Gunther and Anne Rozinat. Disco: Discover your processes. BPM 2012 Demon-stration Track, page 40, 2012. 13

[16] Monika Gupta and Ashish Sureka. Nirikshan: Mining bug report history for discoveringprocess maps, inefficiencies and inconsistencies. In Proceedings of the 7th India SoftwareEngineering Conference, page 1. ACM, 2014. 9

[17] Amaury Habrard, Marc Bernard, and Marc Sebban. Improvement of the state merging ruleon noisy data in probabilistic grammatical inference. In European Conference on MachineLearning, pages 169–180. Springer, 2003. 9

[18] Brent Hailpern and Peri Tarr. Model-driven development: The good, the bad, and the ugly.IBM systems journal, 45(3):451–461, 2006. 1

[19] Christian A. Hammerschmidt, Radu State, and Sicco Verwer. Human in the loop: Interact-ive passive automata learning via evidence-driven state-merging algorithms. arXiv preprintarXiv:1707.09430, 2017. 8, 29

[20] Mohammad Iftekharul Hoque, Vijay Nag Ranga, Anurag Reddy Pedditi, Rachitha Srinath,Md Ali Ahsan Rana, Md Eftakhairul Islam, and Afshin Somani. An empirical study onrefactoring activity. arXiv preprint arXiv:1412.6359, 2014. 33

[21] Falk Howar, Dimitra Giannakopoulou, and Zvonimir Rakamaric. Hybrid learning: interfacegeneration through static, dynamic, and symbolic analysis. In International Symposium onSoftware Testing and Analysis, pages 268–279. ACM, 2013. 5, 47

[22] Falk Howar, Bernhard Steffen, and Maik Merten. Automata learning with automated alpha-bet abstraction refinement. In International Workshop on Verification, Model Checking, andAbstract Interpretation, pages 263–277. Springer, 2011. 12, 17

[23] Pham Ngoc Hung, Toshiaki Aoki, and Takuya Katayama. A minimized assumption generationmethod for component-based software verification. In International Colloquium on TheoreticalAspects of Computing, pages 277–291. Springer, 2009. 8

[24] Hardi Hungar, Tiziana Margaria, and Bernhard Steffen. Test-based model generation forlegacy systems. In Test Conference, 2003. Proceedings. International, pages 150–159. IEEE,2003. 5

[25] Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: a redundancy-freeapproach to active automata learning. In International Conference on Runtime Verification,pages 307–322. Springer, 2014. 5, 13

[26] Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. A field study of refactoringchallenges and benefits. In ACM SIGSOFT International Symposium on the Foundations ofSoftware Engineering, page 50. ACM, 2012. 33

[27] Sander J.J. Leemans, Dirk Fahland, and Wil M.P. van der Aalst. Discovering block-structuredprocess models from event logs containing infrequent behaviour. In International Conferenceon Business Process Management, pages 66–78. Springer, 2013. 6, 7

[28] Manny M. Lehman. Laws of software evolution revisited. In European Workshop on SoftwareProcess Technology, pages 108–124. Springer, 1996. 1

[29] Jiafei Li, R.P. Jagadeesh Chandra Bose, and Wil M.P. van der Aalst. Mining context-dependent and interactive business process maps using execution patterns. In InternationalConference on Business Process Management, pages 109–121. Springer, 2010. 9

[30] Davide Lorenzoli, Leonardo Mariani, and Mauro Pezze. Automatic generation of softwarebehavioral models. In international conference on Software engineering, pages 501–510. ACM,2008. 9


BIBLIOGRAPHY

[31] Ronny S. Mans, M.H. Schonenberg, Minseok Song, Wil M.P. van der Aalst, and Piet J.M.Bakker. Application of process mining in healthcare—a case study in a dutch hospital. InInternational joint conference on biomedical engineering systems and technologies, pages 425–438. Springer, 2008. 9

[32] Flemming Nielson, Hanne R. Nielson, and Chris Hankin. Principles of program analysis.Springer, 2015. 2

[33] N.Yang. Research proposal - exploring learning techniques for inferring model from industrialsoftware component. In Report of preparation phase for the master thesis project. ASML -TU/e, 2017. 4, 10, 31

[34] Jose Oncina and Pedro Garcia. Inferring regular languages in polynomial updated time. InPattern recognition and image analysis: selected papers from the IVth Spanish Symposium,pages 49–61. World Scientific, 1992. 6

[35] Ammar Osaiweran, Mathijs Schuts, Jozef Hooman, Jan Friso Groote, and Bart van Rijnso-ever. Evaluating the effect of a lightweight formal technique in industry. International Journalon Software Tools for Technology Transfer, 18(1):93–108, 2016. 1

[36] Stacy J. Prowell and Jesse H. Poore. Foundations of sequence-based software specification.IEEE Transactions on Software Engineering, 29(5):417–429, 2003. 34

[37] Harald Raffelt, Bernhard Steffen, and Therese Berg. Learnlib: A library for automata learningand experimentation. In International workshop on Formal methods for industrial criticalsystems, pages 62–71. ACM, 2005. 13

[38] Ronald L. Rivest and Robert E. Schapire. Inference of finite automata using homing se-quences. Information and Computation, 103(2):299–347, 1993. 5

[39] Marc Sebban, Jean-Christophe Janodet, and Frederic Tantini. Blue: a blue-fringe procedurefor learning DFA with noisy data. In Proceedings of Genetic and Evolutionary ComputationConference, 2004. 9

[40] B. A. Trakhtenbrot. Finite automata. behavior and synthesis. Journal of Symbolic Logic,42(1):111–112, 1977. 4

[41] Jan Tretmans. Model based testing with labelled transition systems. In Formal methods andtesting, pages 1–38. Springer, 2008. 27

[42] Frits Vaandrager. Model learning. Communications of the ACM, 60(2):86–95, 2017. 4, 5, 13,16, 17, 23

[43] Wil van Der Aalst. Process mining: Overview and opportunities. ACM Transactions onManagement Information Systems (TMIS), 3(2):7, 2012. 6

[44] Wil van der Aalst, Arya Adriansyah, and Boudewijn van Dongen. Replaying history onprocess models for conformance checking and performance analysis. Wiley InterdisciplinaryReviews: Data Mining and Knowledge Discovery, 2(2):182–192, 2012. 42

[45] Wil van der Aalst, Ton Weijters, and Laura Maruster. Workflow mining: Discovering processmodels from event logs. IEEE Transactions on Knowledge and Data Engineering, 16(9):1128–1142, 2004. 6

[46] Wil M.P. van der Aalst. Process mining: Overview and opportunities. ACM Transactions onManagement Information Systems (TMIS), 3(2):7, 2012. 9

[47] Wil M.P. van der Aalst. Process mining: data science in action. Springer, 2016. 7, 8, 9

[48] B.F. van Dongen, A.K.A. de Medeiros, H.M.W. Verbeek, A.J.M.M. Weijters, and W.M.P.van der Aalst. The ProM framework: A new era in process mining tool support. In Interna-


BIBLIOGRAPHY

tional Conference on Application and Theory of Petri Nets, pages 444–454. Springer, 2005.13

[49] Michele Volpato and Jan Tretmans. Approximate active learning of nondeterministic inputoutput transition systems. Electronic Communications of the EASST, 72, 2015. 5

[50] Neil Walkinshaw and Kirill Bogdanov. Inferring finite-state models with temporal constraints.In Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated SoftwareEngineering, pages 248–257. IEEE Computer Society, 2008. 9

[51] Neil Walkinshaw and Kirill Bogdanov. Automated comparison of state-based software modelsin terms of their language and structure. ACM Transactions on Software Engineering andMethodology, 22(2):13, 2013. 27

[52] Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, and Sarah Salahuddin. Improving dy-namic software analysis by applying grammar inference principles. Journal of Software: Evol-ution and Process, 20(4):269–290, 2008. 2

[53] Neil Walkinshaw, Ramsay Taylor, and John Derrick. Inferring extended finite state machinemodels from software executions. Empirical Software Engineering, 21(3):811–853, 2016. 6,20, 47

[54] Lijie Wen, Jianmin Wang, Wil M.P. van der Aalst, Biqing Huang, and Jiaguang Sun. Anovel approach for process mining based on event types. Journal of Intelligent InformationSystems, 32(2):163–190, 2009. 6

[55] Rick Wieman, Maurıcio Finavaro Aniche, Willem Lobbezoo, Sicco Verwer, and Arie vanDeursen. An experience report on applying passive learning in a large-scale payment company.In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on,pages 564–573. IEEE, 2017. 47


Documents

Eindhoven University of Technology MASTER Combining model … · Combining model learning results for interface protocol inference Master’s Thesis Nan Yang Department of Mathematics