Web Based GUI for Natural Deduction Proofs in Isabelle

Web Based GUI

for Natural Deduction Proofs

in Isabelle

Jonas Halvorsen

Master of Science

Artificial Intelligence

School of Informatics

University of Edinburgh

2007

Abstract

It is fair to say that the use of interactive theorem provers is mostly limited to experts

in the field. This project attributed this mainly to the high barrier of entry associated

with using interactive theorem provers, and that most current systems do not aid the

user in visualizing proofs.

A web-based client/server system with a graphical user interface was designed and im-

plemented that users could use to perform point-and-click natural deduction theorem

proving. The system did not require client users to install software in order to perform

proofs, as the system was accessible through the use of a web browser. Proofs were

visualized in box-style notation, and proof construction done by performing point-and-

click actions on this. The sound and widely used interactive theorem prover Isabelle

was used for verifying the proofs created. The system was deemed as successful, based

on the analysis of a user test perfomed.

Acknowledgements

First, I would like to thank my project supervisor, Dr. Jacques Fleuriot, for his helpful

guidance and exceptional dedication to the project undertaken. His extraordinary en-

thusiasm drove the project forwards in difficult times, and his knowledge in the subject

field of interactive theorem proving proved invaluable.

I would also like to thank Sean Wilson for his valuable contribution to the project

in terms of comments and support. His maticulous corrections to the report were very

helpful.

Contents

1 Project Statement 10

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Description of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.1 Expert knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.2 Limited availability . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Background and Existing Work 14

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Interactive Theorem Provers . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Proof Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Proof General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Pcoq, LogiCoq and IsaWin . . . . . . . . . . . . . . . . . . . . . 16

2.3.3 Pandora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.4 Jape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.5 System Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.6 ProofWeb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Graphical Proof Representation . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Fitch-style notation . . . . . . . . . . . . . . . . . . . . . . . . . 21

2

3 Requirements 23

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 Box-style Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.2 Point-and-click Proof Creation . . . . . . . . . . . . . . . . . . . 24

3.2.3 Store proof scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.4 Open proof scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.5 Verify proof scripts . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 User Interface Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.1 Easy to understand . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.2 Hide theorem prover syntax . . . . . . . . . . . . . . . . . . . . . 25

3.3.3 Provide help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Accessibility and Performance Requirements . . . . . . . . . . . . . . . . 26

3.4.1 Provide theorem proving remotely . . . . . . . . . . . . . . . . . 26

3.4.2 Appear to work locally . . . . . . . . . . . . . . . . . . . . . . . . 27

4 System Specications and Design 28

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Technology and External Software . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.2 PGIP and Proof General Kit Broker . . . . . . . . . . . . . . . . 30

4.2.3 AJAX and jQuery . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.4 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.5 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Conceptual System Design . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 System Overview of Architecture . . . . . . . . . . . . . . . . . . 35

4.3.2 Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.3 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.4 Persistent Storage . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 Design Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3

5 Implementation 40

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2 Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2.1 PGKit Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1.1 Available version non-working . . . . . . . . . . . . . . . 41

5.2.1.2 Instability . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1.3 Few PGIP commands implemented . . . . . . . . . . . . 42

5.2.1.4 Complexity of PGIP protocol . . . . . . . . . . . . . . 43

5.2.1.5 PGIP missing a remove object command . . . . . . . . 44

5.2.2 Isabelle Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2.2.1 Missing XML output feature . . . . . . . . . . . . . . . 44

5.2.2.2 PGIP communication . . . . . . . . . . . . . . . . . . . 46

5.2.3 PHP issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.3.1 Simplexml and mixed content nodes . . . . . . . . . . . 46

5.2.3.2 Security mode restrictions . . . . . . . . . . . . . . . . . 47

5.3 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3.1 Creating and Displaying the proof hierarchy . . . . . . . . . . . . 47

5.3.1.1 Isabelle does not recall proof history. . . . . . . . . . . 47

5.3.1.2 Isabelle's subgoal numbering . . . . . . . . . . . . . . . 47

5.3.1.3 Repeated assumption listings . . . . . . . . . . . . . . . 48

5.3.1.4 Converting from XML to HTML . . . . . . . . . . . . . 48

5.3.1.5 Javascript timeout . . . . . . . . . . . . . . . . . . . . . 49

5.3.2 Point-and-click Proof Creation . . . . . . . . . . . . . . . . . . . 49

5.3.2.1 Available proof rules . . . . . . . . . . . . . . . . . . . . 49

5.3.2.2 Proof dependencies and reuse . . . . . . . . . . . . . . . 50

5.3.2.3 Isabelle's lack of labelling . . . . . . . . . . . . . . . . . 51

5.3.2.4 Replaying proof . . . . . . . . . . . . . . . . . . . . . . 52

5.3.2.5 Instantiation of variables in proof rules . . . . . . . . . 53

5.3.2.6 Closed subgoals . . . . . . . . . . . . . . . . . . . . . . 53

5.3.2.7 Closing subgoals . . . . . . . . . . . . . . . . . . . . . . 54

4

5.3.2.8 Undoing a proof step . . . . . . . . . . . . . . . . . . . 54

5.3.2.9 Web browser compatibility . . . . . . . . . . . . . . . . 54

5.3.3 Message Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.3.1 Initial solution . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.3.2 Revision 2: callback functions . . . . . . . . . . . . . . . 56

5.3.3.3 Revision 3: ajaxQuery plug-in . . . . . . . . . . . . . . 56

5.3.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.4.1 Expanding proofs . . . . . . . . . . . . . . . . . . . . . 57

5.3.4.2 Colour scheme . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.4.3 Help clues . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.4.4 Box hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.4.5 Menu panel vs. Toolbar . . . . . . . . . . . . . . . . . . 59

5.3.4.6 Layout of proofs . . . . . . . . . . . . . . . . . . . . . . 59

5.3.4.7 Viewable proof script . . . . . . . . . . . . . . . . . . . 60

5.3.4.8 Graphical conrmation of nished proofs . . . . . . . . 60

5.3.4.9 Show/hide proofs and proof branches . . . . . . . . . . 60

5.3.4.10 Meta-variables. . . . . . . . . . . . . . . . . . . . . . . . 61

5.3.4.11 Drag-and-drop vs. clickable expressions. . . . . . . . . . 61

5.3.4.12 Selecting subgoal to work on . . . . . . . . . . . . . . . 62

5.3.4.13 Showing natural deduction rule names . . . . . . . . . . 62

5.3.4.14 Using mathematical symbols . . . . . . . . . . . . . . . 62

5.3.4.15 Right-click undo . . . . . . . . . . . . . . . . . . . . . . 63

5.3.4.16 Customizability . . . . . . . . . . . . . . . . . . . . . . 63

5.4 User-story Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Evaluation 78

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3 Verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3.1 Testing Framework and Tools . . . . . . . . . . . . . . . . . . . . 79

6.3.2 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5

6.3.3 Integration testing . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.4 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4.1 User Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4.1.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4.2 Results of User Test . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4.2.1 Questionnaire part 1 . . . . . . . . . . . . . . . . . . . . 83

6.4.2.2 Questionnaire parts 2-4, 6 . . . . . . . . . . . . . . . . . 84

6.4.2.3 Questionnaire part 5 . . . . . . . . . . . . . . . . . . . . 85

6.4.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 86

7 Discussion 87

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.2 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.4 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.4.2 PGKit Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.4.3 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.4.4 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.5 Outlook on Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.7 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Bibliography 93

Appendices

A XML Messages 98

A.1 PGIP reply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

A.2 PGIP state display vs. revised XML . . . . . . . . . . . . . . . . . . . . 99

B Natural Deduction Rules 102

C Questionnaire 104

6

List of Figures

2.1 Proof General interface to Isabelle . . . . . . . . . . . . . . . . . . . . . 16

2.2 Pandora interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Jape interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 ProofWeb interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Box-style proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Gentzen-style proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1 Use-Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 PGKit broker system architecture . . . . . . . . . . . . . . . . . . . . . 31

4.3 PGIP message exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.6 Example Interaction Sequence . . . . . . . . . . . . . . . . . . . . . . . . 36

4.7 Draft GUI 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.8 Draft GUI 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.2 Law of Excluded Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3 Applying forward rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4 Guide to user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5 Help clue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.6 Menu panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.7 Proof script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.8 Selecting assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.9 Login screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7

5.10 Empty desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.11 Creating a new le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.12 Empty le created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.13 Add theorem button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.14 Theorem denition prompt . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.15 New theorem added to script . . . . . . . . . . . . . . . . . . . . . . . . 68

5.16 → i button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.17 Applying → i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.18 PBC button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.19 Applying PBC backwards the 1st time . . . . . . . . . . . . . . . . . . 70

5.20 ¬ε button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.21 Specifying new subgoals ( 1st ¬ε backwards) . . . . . . . . . . . . . . . . 72

5.22 Applying ¬ε backwards the 1st time . . . . . . . . . . . . . . . . . . . . 72

5.23 Assumption button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.24 Closing a branch with an assumption . . . . . . . . . . . . . . . . . . . . 73

5.25 Applying ∀i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.26 Applying PBC backwards the 2nd time . . . . . . . . . . . . . . . . . . 74

5.27 Specifying new subgoals ( 2nd ¬ε backwards) . . . . . . . . . . . . . . . 75

5.28 Applying ¬ε backwards the 2nd time . . . . . . . . . . . . . . . . . . . . 75

5.29 Instantiating a quantied variable . . . . . . . . . . . . . . . . . . . . . 76

5.30 Applying ∃i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.31 The nished proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

8

List of Tables

6.1 SUS evaluation scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

9

Chapter 1

Project Statement

1.1 Introduction

Automated theorem proving is a subject eld within Computer Science that aims to

automate the process of generating and verifying proofs by mechanical means. It has

numerous practical uses such as mechanizing mathematics, discovering novel proofs

(e.g. as for Euclidean geometry), verication of algorithms in hardware and software

systems, and for developing reasoning systems for intelligent agents. It can improve

our understanding of how humans learn mathematics and automate tasks that we nd

tedious [38, pp. 308-315].

However, fully automatic theorem proving is infeasible in most cases because most non-

trivial theorems are undecidable. The sheer size of the search space within most problem

domains and the processing time associated with searching for proofs are also partly

responsible for making it infeasible.

Human-guided theorem proving, also called interactive theorem proving, has however

been shown to be of use in practice. This involves combining the use of human intuition

and software automation in order to create proofs, and as a result it is often able to

reduce the search space signicantly (compared to fully automatic systems) and thus

nd proofs within realistic time [38, pp. 308-315].

Interactive theorem provers frequently rely on using symbolic logic in order to represent

theorems in mathematics or to mimic human reasoning. Novices wishing to use interac-

tive theorem provers thus have to go through the processes of learning a symbolic logic

language, such as propositional logic or predicate logic, while simultaneously learning

how to perform theorem proving within these logics [31].

Theorem proving within these logics is often done using natural deduction, a formal

reasoning model that aims to mimic human reasoning in order to make proofs easier to

perform by humans [27, pp. 6-27].

10

There is a steep learning curve associated with using interactive theorem provers since

one has to learn how the systems work and get to grips with the specics of their

underlying proof languages in addition to the symbolic logic languages mentioned above.

This is further complicated by the user having to set up an interactive theorem proving

environment.

1.2 Description of Problem

The use of interactive theorem provers is mostly limited to experts in the eld [26, 43].

This project aims to make theorem proving more widely used in terms of reducing the

high barrier of entry associated with using interactive theorem provers and to aid the

user in visualizing proofs performed using natural deduction.

1.2.1 Expert knowledge

The area of interactive theorem prover user interfaces has had limited attention and

success compared the development of the theorem provers themselves. Although there

have been several previous projects aimed at creating user interfaces for theorem provers,

they have had little impact in the eld and uptake has in general been quite low (the

Proof General generic interface application in Emacs being the exception) [26, 39, 43].

The few user interfaces that do exist rely mostly on textual representation of proofs

and it is the general consensus that they do not do a specially good job in presenting

proofs that are appropriate for human reading. As a result of lacking appropriate user

interfaces, the area of theorem proving has a high barrier of entry which limits its use

to experienced proof experts that know the prover's syntax and intricacies [26].

In order to reach a wider audience, it is necessary for the systems to be more convenient

to use and be more facilitating towards non-experts such as novice students or experts

in other academic elds that need to perform theorem proving. One solution to cater

for this is to provide eective user interfaces to the theorem provers that provide point-

and-click functionality and visualize formal proofs in an appropriate way for human

reading [39, 43, 47].

1.2.2 Limited availability

Another limit to the widespread use of theorem provers is the complexity of setting up

and running interactive theorem prover systems [30, 39].

Isabelle and many other interactive theorem provers usually only readily supports the

architecture/operating system used by its developers (often Unix or Linux), making a

barrier of entry in terms of the required platform. They can also require the presence of

11

dependency software e.g. specic compilers [34]. For example, it is quite manageable to

get Isabelle working under Linux, but more dicult to get it running under Microsoft

Windows XP. Additionally, theorem proving often involve processes that can be resource

intensive (e.g. performing pattern matching and search), increasing local hardware

requirements for the prospective user [30].

1.3 Project Aim

To achieve the goal of making it easier to for users to perform natural deduction theorem

proving, the project should address both the graphical representation and interactive

editing of proofs as well as the accessibility in terms of installing and running theorem

proving software.

The solution that we propose is to develop a client/server system that provides a web-

based interactive point-and-click graphical user interface that users can use to perform

natural deduction proofs in the higher order logic of Isabelle. It is hoped that this system

will make interactive theorem proving easier to perform, in terms of visualization and

removing the need for client-side installation, and thus make it more available for a

wider audience.

By providing the functionality of creating proofs by point-and-click interaction, the

system should remove the need for the user to learn prover specic commands. This

should especially benet novices in the eld of logic, as it is dicult to learn and

understand the specic proof language of an interactive theorem prover in addition to

symbolic logic. Additionally, the system should represent proofs in box-style notation

which is the style that most logic text books use [1], in order to make proof construction

easier to follow.

By having a client/server architecture to the theorem prover engine that users can access

using a web-based user interface, the system should address the issue of availability.

Users should be able to perform interactive theorem proving using a web browser that

communicates asynchronously with a web service (so as to act as if it is being run

locally) that interacts with the Isabelle theorem prover.

1.4 Project Objectives

The envisioned system should provide the following features in order to reach the out-

lined aims of the project:

1. It should present the user with a graphical user interface.

2. It should present formal proofs in box-style notation.

12

3. It should provide the functionality to perform point-and-click creation of formal

natural deduction proof scripts.

4. It should provide a client/service architecture that a user can access though a web

browser.

5. It should be responsive so that the user feels like the system is being run locally.

13

Chapter 2

Background and Existing Work

2.1 Introduction

This chapter introduces relevant background knowledge in the eld of proof editors for

use in interactive theorem proving.. It touches upon relevant topics for this project such

as interactive theorem provers, proof editors and graphical proof representation.

2.2 Interactive Theorem Provers

The development of interactive theorem provers is an active area of research and there

are a number of theorem provers in existence and use. Interactive theorem provers such

as Coq [41], HOL [42] and Isabelle [34, 37] are just some of the popular ones within

the eld [43]. They each have individual strengths and weaknesses making some more

appropriate in certain situations as compared to others. As each system has its own

proof script language, and that they each work in dierent ways, it dicult for users

to swap between using dierent interactive theorem provers. As a result interactive

theorem prover users tend to stick to using the system they are already familiar with.

2.2.1 Isabelle

Isabelle [37] is a general purpose theorem prover, developed by Larry Paulson and

Tobias Nipkow . It is written in the ML functional programming language. It caters

for a wide range of dierent logics, from classical to intuitionistic logic, propositional

to higher-order logic, including set theory, and is widely used and known to be a sound

system [37]. Furthermore, Isabelle has its own meta-logic (a higher-order intuitionistic

logic) which it uses to dene other logics, statements and inference rules.

Isabelle follows the Logic for Computable Functions (LCF) [23] approach , similar to

systems such as Coq and HOL. The LCF approach is based around an abstract data

14

structure that represents a theorem and only allows inference rules to work on the proof.

This design makes it relatively easy to implement in a programming language and also

makes the system easily extensible in terms of dening new rules based on the core

inference rules. Another benet of using the LCF approach is that one is allowed to

work backwards from a goal, which can make the proof easier to direct [37].

Isabelle proof scripts consists of a sequence of proof commands. The proof scripts

can either be written in raw ML (tactical style) or in the Isar language [35], Isabelle's

own scripting language. Isar aims to make proof scripts more readable for humans.

Furthermore, Isar allows one to write in dierent proof styles: declarative and procedural

[35].

2.3 Proof Editors

Current proof editors can roughly be divided into two groups; textual based general

proof editors mainly aimed at experts, and graphical based specic (i.e. natural de-

duction) proof editors aimed mainly at teaching the concept of logical proofs to novice

students. Within the latter eld there seem to be several dierent systems available

with more or less the same functionality.A common pattern within this group is that

the systems are seldomly used outside of the education institution where they were

created. A commonality is that for both groups of proof editors, adoption has been

limited.

There are some user interfaces that have gained reasonable popularity. Most notably is

that of Proof General [2].

2.3.1 Proof General

Proof General is a generic interface for proof assistants [6] that represents proof and

proof steps in textual representation within the Emacs text editor (see Figure 2.1).

The system is widely used due to its book-keeping functions and that it supports a

wide range of theorem provers. It provides to some extent a simpler interaction model

than interacting directly with the prover engine, and supports displaying script language

terms as symbols for easier comprehension (e.g. represent logical implication with →).

It does to a limited extent provide facilities for point-and-click functionality, although

there is very little emphasis on this, and it is only provided when used with the Lego

theorem prover.

15

[6]

Figure 2.1: Proof General interface to Isabelle

Proof General's main features and emphasis are on script management, script navigation

and performing basic bookkeeping. The major drawback with Proof General from the

user's perspective is that the user interface does little to represent proofs in a more

manageable way for users to understand. It still mainly relies on users inputting prover

specic commands to a text script. However, it should be mentioned that Proof General

was not designed with the novice in mind and is aimed towards power users as its web

page states [2]. There is work currently being done to reimplement Proof General's

functionality from using Emacs to using Eclipse as the underlying platform [6]. The

motivation for this is to overcome the weaknesses in the current system in terms of

lessening the cognitive burden on the user (associated with the complicated interface

of Emacs), and to improve its maintainability by separating from the close-coupling to

the Emacs Lisp API [47].

2.3.2 Pcoq, LogiCoq and IsaWin

Another proof editor that had previously acquired interest in the interactive theorem

proving community was Pcoq [9] which was an interactive proof assistant for the Coq

theorem prover. It provided the ability to do point-and-click interactive theorem prov-

ing on Coq proof scripts. However, the project has been on hiatus since 2003 and seems

to be no longer active. There also existed a version of Pcoq called LogiCoq [32], which

was accessible through the Web. Unfortunately, this project too no longer seems active.

16

The IsaWin project [6], a generic graphical user interface for the Isabelle prover with

similar functionality of Pcoq, seems to have experienced the same fate. The common-

ality between the aforementioned proof editors is that they are all intended for general

purpose theorem proving, and aimed to facilitate a wider eld than that of just natural

deduction proof of higher logic. The user is required to have more specic knowledge

in order to use the proof editor as it is not as tailored for a specic use, nor is the proof

visualization optimized for the proof type at hand. It would be benecial for users per-

forming interactive theorem proving to have the proof at hand visualized in a suitable

manner. i.e. box-proofs for natural deduction based proofs or tree/graph-structure for

inductive proofs.

2.3.3 Pandora

Pandora is a tool used for teaching natural deduction in rst-order logic (see Figure 2.2).

It is developed at Imperial College London and is used extensively in their undergraduate

introductory course to logic [14]. The system has been in existence since 1996, and the

current version of the system (version 3) runs as a Java applet within a web browser.

It provides point-and-click creation of natural deduction proofs through the use of a

graphical user interface. It renders proofs in box-style/Fitch-style notation, and has

an extensive tutorial and help library built in to guide the user in learning logic. The

system does not rely on using an established theorem prover in the background, rather

relying on its own simple proprietary proof verication engine.

[14]

Figure 2.2: Pandora interface

17

2.3.4 Jape

One system that does stand out as being dierent within the group of proof editors

with graphical user interfaces is Jape [10, 11, 12], developed by Richard Bornat and

Bernard Sufrin. It is referred to as a proof calculator rather than a teaching tool,

and was designed as an interactive proof support tool with a high-quality graphical

interface (see Figure 2.3). Jape is dierent in that it is marketed as a general tool

that can be used to create logic specic point-and-click interactive proof editors [12].

The proof rules to be available in the proof editor are specied in a simple syntax. For

example, a lecturer teaching logic can specify that only certain rules are available , and

thus restricting the students to perform proofs using only these rules.

Although Jape is geared as a tool for creating proof editors, it has so far mainly been

used for teaching logic and is used at a handful of educational institutions. As it was

not intended as a teaching tool, it does not provide any help or advice to users on how

to perform proofs, making it somewhat ill-suited for teaching logic.

It is dicult to place Jape in a clear category as to what kind of proof tool it is. It is

not a teaching tool per se,. and the authors state that it is neither a theorem prover

either, as it only reacts passively based on user action [12] . However, its authors also

note that the application can be used by experts to develop small-sized logical systems.

As a result, the system ts in between teaching tools and proper proof editors that use

interactive theorem provers.

[10]

Figure 2.3: Jape interface

2.3.5 System Coupling

The consensus is that proof editors have suered in the past as they were closely coupled

with the theorem prover [6, 39, 30, 43, 46]. The creation of the theorem prover engine

usually had priority and the creators focused seldomly on the HCI aspects of the user

18

interfaces. As a result, the user interfaces had little success. Furthermore, due to

the close coupling between theorem prover and user interface, it would be dicult for

other people to create third-party user interfaces that could interact with the theorem

prover as the designers had not envisioned the use of external user interface applications.

Current research seems to agree that theorem provers and user interfaces should have

a client/server (or distributed) architecture so that they can be loosely coupled. This

opens up the opportunity of having the theorem prover and proof editor running on

dierent machines [6, 30, 39, 43].

Notable examples of systems that use a client/server architecture are ProofWeb [30, 31],

LogiCoq [32], LΩui [39] and to some extent Proof General for Eclipse [3]. There are as

many frameworks for composite proof architecture as there are implemented systems.

ProofWeb, LΩui and the Proof General project all outline frameworks that could be

used. With the exception of the framework outlined by the Proof General project, where

the so-called Proof General Interaction Protocol is used to interact with the theorem

provers, they have all taken ad-hoc approaches without any aim to standardize the

protocols.

2.3.6 ProofWeb

ProofWeb [30, 31], like Pandora, is intended for use in teaching logic to undergradu-

ate students. It diers from both Jape and Pandora in that it is provided through a

centralized server, accessible through the use of a web browser. Furthermore it utilizes

a proper interactive theorem prover (Coq) rather than a lightweight proprietary proof

verier [30, 31]. Its aim is not just to teach logic, but also teach users how to perform

proper theorem proving (on a real theorem prover rather than a toy system) and cre-

ating proof scripts [31]. This diers from Pandora that only aims to teach the concept

of creating logical proofs. ProofWeb does not support creating proofs by point-and-click

and suers from the limitations of textual use interfaces.

19

[31]

Figure 2.4: ProofWeb interface

2.4 Graphical Proof Representation

It seems that the issue of graphical point-and-click versus textual proof script creation

is still an open question. Although is generally accepted that graphical representation

is needed in order to lessen the cognitive burden on users and lower the barrier of

entry [43, 26], expert users of interactive theorem provers can sometimes feel that such

systems hinder their work by forcing limitations upon them. This might be a reason as

to why development in the area has been slow.

A further issue regarding graphical proof representation resides in the display style.

Dierent kinds of logic proofs benet from dierent types of representation style. Thus,

there is no one style that is universally the best style for theorem proving, So far, the

style that is currently applicable to the widest range of proofs (disregarding the tradi-

tional linear representation style) is that of representing proofs as trees. A noteworthy

mention here is HiProofs, which is a specic representation of tree proofs [16]. There

is however a problem with representing proofs as trees as many proofs are cyclic and

are thus more appropriately represented as graphs. Furthermore, HiProofs are not well

suited for larger proofs as they tend to introduce signicant branching [43].

For the area of theorem proving in natural deduction, the representation styles used

can roughly be narrowed down into three types:

• Linear representation (Quine-style)

• Representation as proof trees (Gentzen-style)

20

• Representation involving proof boxes (Fitch-style)

Most of the tutorial systems mentioned in Section 2.3 utilize the box-style notation.

Box-style notation is the notation style that most logic textbooks and pen-and-paper

proofs [1]. For representing natural deduction proofs it seems to be more appropriate to

use one of the three traditional notation styles. For more general proof editors, however,

this might not be the case.

2.4.1 Fitch-style notation

Fitch-style notation, or box-style notation (see Figure 2.5) , is a proof representation

style that is much used within natural deduction theorem proving [1]. It is widely used

in logic text books due to its strengths in making proofs easier to follow and read in

comparison to traditional linear representation. The style gives structure to linear proofs

in terms of boxes, which aims to make it easier for users to keep track of how the proof

is proceeding in comparison to purely linear proofs. It also prevents the need to repeat

assumptions which can greatly clutter a proof. Furthermore, it has the advantage of

being more convenient that Gentzen-style proofs (see Figure 2.6) for representing large

proofs. This is due to its linear nature that prevents it from having the same problem

with branching as Gentzen notation has [31]. In box-style notation, boxes are drawn

around goals to indicate variable scope and to visualize the nesting of subgoals incurred

when working on a proof backwards from the goal.

[14]

Figure 2.5: Box-style proof

21

[31]

Figure 2.6: Gentzen-style proof

22

Chapter 3

Requirements

3.1 Introduction

This chapter outlines the requirements that the target system should aim to address.

The requirements were identied based on the description, aim and objective of the

project, listed in Chapter 1, as well as from the Informatics Research Proposal submis-

sion and the initial project proposal by Jacques Fleuriot [28]. These requirements were

further divided into functional requirements, user interface requirements, and accessi-

bility and performance requirements.

Each requirement identied is listed with the following information:

• An explanation of what the requirement is.

• A justication as to why the requirement was needed, and how it solves the

problem outlined in the project.

The requirements outlined should, as much as possible, be free from information regard-

ing implementation and system design as this will be the target of Chapter 4. However,

it is critical to capture the requirements correctly as they will be used when creating

the conceptual design of the software system in order to ensure that the right system is

being built (to solve the problem at hand).

3.2 Functional Requirements

The functional requirements outline the core functionality that the system should pro-

vide in order to successfully address the the outlined aims and objectives.

23

3.2.1 Box-style Notation

Explanation The system should render formal proof scripts (of natural deduction)

in box-style notation.

Justication Representing proof scripts graphically should make it easier for users

to comprehend the formal proofs that exist in a proof script. Box-style notation is a

proof notation style that is widely used in logic textbooks and when teaching logic at

the university level. As a result, it is a notation style that is widely known and the style

that most novices to logic and theorem proving are rst introduced to [1]. The style is

often used when performing pen-and-paper proofs as it gives graphical structure to the

proof being created, which makes it easier for humans to comprehend. It is expected

that users will nd it easier to understand the structure of the proof if it is represented

in this notation.

3.2.2 Point-and-click Proof Creation

Explanation The user should be able to create a formal logical proof in natural

deduction without having to know the script language of the underlying interactive

theorem prover or having to perform textual creation of the proof script. The user should

be presented with a limited set of natural deduction introduction and elimination rules,

which can be applied to the proof being worked on by point-and-click mouse gestures.

Justication This requirement should make it easier for users to perform natural

deduction proofs as it removes the need for knowledge about how the specic interactive

theorem prover works.

3.2.3 Store proof scripts.

Explanation The system should provide the user with the facility to persistently

store a copy of the proof script, currently in the buer, for later retrieval.

Justication The user might want to save an unnished proof script in order to

return and work on it at a later stage.

3.2.4 Open proof scripts.

Explanation The system should provide the user with the option to open previously

created proof scripts for further work (or review).

24

Justication The user might want to stop working on a proof script, or to continue

working on an unnished proof script that has previously been stored. Additionally,

the user might want to go back and review previously completed proof scripts.

3.2.5 Verify proof scripts

Explanation The user should be provided with the facility to run the script in a

sound and widely used/accepted interactive theorem prover, and provide the user with

the results obtained after each proof step execution (new subgoals).

Justication The user will want to run the proof script to verify that the proof is

correct so far and to see what the formal proof looks like so far. The user can then

apply natural deduction rules to unnished proofs if required.

3.3 User Interface Requirements

The user interface requirements are requirements that should address how the graphical

user interface should be designed in terms of features it should provide or general design

principles that should be adhered to. The aim of the user interface requirements is to

make sure that the designed user interface is suitable for the users in terms of usability.

3.3.1 Easy to understand

Explanation The user should be presented with a user interface that is easy to under-

stand, simple to use, and not overly complicated. The user interface should be intuitive

and thus should not require that the user memorizes how to operate the system.

Justication The system should have a low barrier of entry to use. Thus the user in-

terface should cater for non-experts. By keeping the user interface as basic and intuitive

as possible, the user is less likely to get confused.

3.3.2 Hide theorem prover syntax

Explanation The system should remove the user's need to know how to use the

underlying theorem prover in order to perform formal proofs in natural deduction, and

should hide prover-specic information from the user.

25

Justication In order to reduce the barrier of entry, the user should not be required

to know any specic knowledge of how to use the interactive theorem prover at hand.

Thus, the system should be as generic as possible in terms of mimicking pen-and-paper

box proofs. The user should be presented with generic, well known natural deduction

rules as they appear in logic textbooks and pen-and-paper proofs rather than prover

specic syntax and names to these tactics.

3.3.3 Provide help

Explanation The user interface should provide simple helpful clues and indications

to what the dierent interactable objects in the user interface do. This means explaining

what the button is used for, and in the case of the buttons for applying a proof rule,

give a short explanation what the proof rule looks like.

Justication The clues, coupled with the intuitive and simple user interface require-

ment, should make it easier for the user to use the system. It should also remove the

need for an extensive user guide as most of the guidance is provided directly to the

user in the user interface when they are likely to need it, rather than having to look up

information in a separate document (printed or electronic format).

3.4 Accessibility and Performance Requirements

Accessibility requirements refer to how easy it is to begin using the system for interactive

theorem proving,for a new user, in terms of system requirements and time needed in

order to get a system up and running (rather than how easy it actually is to use the

system to do theorem proving). Performance requirements, on the other hand, refer to

how reactive the system is from the user's perspective.

3.4.1 Provide theorem proving remotely

Explanation The system should provide users with the facilities to perform natural

deduction proofs without requiring that software be installed at the user end (other

than having access to a web browser).

Justication The system should have a low barrier of entry to use. Thus, the system

should not require users to spend time installing and conguring software in order to

start performing formal proofs.

26

3.4.2 Appear to work locally

Explanation Even though the system is accessed remotely, it should be responsive

and behave as if the application is being run on the local system.

Justication Blocking the web browser and thus forcing the user to have to wait idly

until a processing act has been completed can confuse the user and/or make the user

lose attention. Thus, it is important for users to feel that the system is responsive.

27

Chapter 4

System Specications and Design

4.1 Introduction

This chapter presents the conceptual design created in accordance to the requirements

outlined in the previous chapter. The main tasks that the user will be able to perform

when using the system (shown in Figure 4.1) are :

1. Load a proof script into the system to work on.

2. Create a new proof script to work on.

3. Apply a natural deduction introduction/elimination rules to the proof script in

focus.

4. Run the proof script at hand in an interactive theorem prover

5. Save the edited proof script to persistent storage.

Figure 4.1: Use-Case Diagram

28

In order to address the issue of accessibility and making it easy for new users to use the

system, it was decided that a web-based, client-server architecture would be a suitable

one. Following such an architecture means that the two parts can be implemented

separately, using dierent programming languages and being run on dierent platforms

as long as they can still use an agreed method of communication. The current trend is

to create systems that do not require the client to install additional software, using only

a web browser to access the functionality of the system. This trend has been apparent

for the better part of a decade, as evidenced by so called web applications [22]. The

use of this design solution would hopefully minimize the requirements set on the user

in terms of:

• Operating System Architecture - Requires only a web-browser to use, which is

available on all modern operating systems. Thus the system gives the user greater

freedom as to what operating system they can use.

• Need to install software locally - The server will do most of the underlying pro-

cessing, upon the request of the client. The client will utilize the web server's

functions through the use of the web client application run in a web browser. As

a result, there is no need for the client to install any software locally.

• File space - As the proof scripts are stored and run on the server, and there is

no need to install any software locally, the system does not place any le space

requirement on the client-side (apart from regular web-browser cache and/or op-

erating system swap le).

• Hardware requirements - As the interactive theorem prover is being run on the

server rather than at the client, the system does not put any specic requirements

on the user's system in terms of the amount of memory required and the speed of

the CPU.

There are some consequences of this choice of architecture. One is that it introduces

a requirement that the user needs to have access to a network connection in order to

communicate with the server. Another consequence of a client-server architecture is that

it introduces a time delay of interaction between the client and server which is avoided

(or relatively negligible) in other architectures. These delays can vary widely according

to the network speed of the connection between client and server. However, there are

techniques and designs that can be used in the client-server architecture in order to

minimize the time delay inicted. Additionally, the aspect of the system that is most

likely to incur the longest time delay is the processing of a proof step application in the

associated interactive theorem prover, which would not be avoided in other architectures

either.

29

4.2 Technology and External Software

The conceptual system should rely on using a variety external software and technol-

ogy in order to provide the above mentioned functionalities. These dependencies and

technologies are explained and justied in this section.

4.2.1 Isabelle

The conceptual system should rely on using the Isabelle general purpose interactive

theorem prover [34] for verifying and processing formal proofs. Isabelle was chosen as

the underlying interactive theorem prover for this project due to its widespread use

within the eld of theorem proving, and its general acceptance as a sound system.

Furthermore, the author of this project has previous experience with using the Isabelle

theorem prover and this choice would remove the need for learning the intricacies of a

new interactive theorem prover given the relative short time-span of the project. For a

description of the Isabelle system itself, see Subsection 2.2.1.

4.2.2 PGIP and Proof General Kit Broker

The conceptual system should utilize the Proof General Interaction Protocol (PGIP) [5]

in order to interact with Isabelle, rather than communicating with Isabelle natively. The

reason for this is that it is dicult to interact directly with Isabelle at the programming

level, as it was designed mainly for direct textual human interaction through a command

terminal. Another motivation for using PGIP is that it is a formally specied XML

language designed to be used for communicating with theorem provers, and was designed

not to be prover specic. Unfortunately, at the time of writing, no theorem provers other

than Isabelle has a PGIP wrapper.

The system should use the Proof General Kit1 broker [45] for parsing proof scripts

and issuing commands to Isabelle. This has the additional benet that PGIP display

commands can be used rather than only PGIP prover commands. This should provide

a more appropriate way of communicating with the theorem prover as the PGIP display

commands are intended to cater for proof editors. Figure 4.2 shows the intended use

of the PGKit broker as envisioned by its authors, Christoph Lüth and David Aspinall

[46].

1Henceforth referred to as PGKit.

30

[46]

Figure 4.2: PGKit broker system architecture

An example of how the PGKit broker can work as a middle-man, between the graphical

user interface and the interactive theorem prover, is shown in Figure 4.3. In the gure,

the PGKit broker translates PGIP display commands (sent from the GUI application)

into PGIP prover commands and sends it to the Isabelle theorem prover. Isabelle's

PGIP wrapper then converts the PGIP prover commands into native Isabelle commands.

Results from Isabelle are passed on by the broker to the GUI application without any

modication. Note that the use of the word command in this scenario refers to system

commands rather than proof script commands. Proof script commands are encapsulated

within the system commands and the contents are copied without alteration between

the translation/conversion stages (e.g. between display and prover commands).

[46]

Figure 4.3: PGIP message exchange

The PGKit broker should also aid the conceptual system with keeping track of modi-

cations done to the proof script. The PGKit broker is responsible for performing le

operations such as creating, loading and modifying proof script les. Upon the request

to load a proof script, the PGKit broker breaks down the individual lines in the proof

script into objects (which are assigned unique IDs) which it then holds in a local buer.

The broker does this to simplify the process of editing and maintaining a correct image

of the proof script at all times. Each request by the client to modify the proof script

results in an update of the PGKit buer. The changes to the script are then passed to

Isabelle for parsing to verify that the new lines in the script are syntactically correct.

31

The object(s) added or modied to the buer will now be in one of two states; parsed (if

syntactically correct) or unparseable (if syntactically incorrect). Objects in the parsed

state can then be requested to have its state changed to processed which will trigger

Isabelle to execute the proof step and return the resulting proof state. To retract the

process stage (e.g. in order to edit the object), the object's state has to be reverted to

parsed.

This kind of work on the script could not have been done client-side as the system runs

within a web-browser, and is thus not permitted to perform le operations by default,

due to security issues. Furthermore, having the web-client deal with script loading,

buering and issuing prover commands directly to the Isabelle theorem prover would

certainly slow down the system. This is due to the large increase in messages being

sent over the network to Isabelle (each display message sent results in several prover

commands), as well as making the web client heavier and the code more complicated

which could negatively aect the responsiveness of the system.

4.2.3 AJAX and jQuery

The web-based client should rely on using Javascript to perform asynchronous com-

munication with the server and to dynamically modify the Document Object Model2

[49] of the web page based upon receiving XML updates. This technique is known as

Asynchronous Javascript and XML (AJAX) and is a recent trend that has appeared

that builds upon the web application concept, utilizing web services [22].

AJAX provides a way to make web applications feel more responsive by hiding the

communication between the client and the server from the user and allowing server

requests to be sent asynchronously. Users can continue to interact with the application

while the AJAX engine deals with the interaction requests. As a result, the client's web

browser is not blocked when server requests are sent. The AJAX engine updates the

DOM model of the rendered web page directly upon receiving results, so there is no

need to re-render the whole web page [22, 30].

As there is no need to re-render the whole screen, the server only needs to return the new

data rather than a completely new web page. This reduces the network overhead, which

results in the system feeling more responsive if there is high latency on the network.

The combination of reduced network trac, the ability to send messages asynchronously

and updating the DOM directly, should make the web application feel more responsive

and mimic that of a locally run application [22, 30].

Several Javascript libraries have appeared lately that aim to simplify the process of cre-

ating AJAX enhanced web applications by providing sets of commonly used functions,

such as traversing the DOM or creating user interface functionality (e.g. drag-and-drop)

2Henceforth referred to as the DOM.

32

. There are many competing libraries such as Prototype.js [40], Script.aculo.us [20] and

YahooUI! [54], which all more or less provide the same functionality. The one chosen

for this project is a relatively new library called jQuery. Its strength lies in the way it

makes it easy to traverse and modify the DOM through the use of selector functions

to locate DOM objects based on XPath expressions [52] and/or Cascading Style Sheet

(CSS) attributes [48]. The core jQuery library itself is quite small compared to the

other aforementioned libraries, but it has a large active community contributing with a

vast library of user created plug-ins.

4.2.4 PHP

The server should be written in the PHP scripting language, and should utilize the PHP5

pre-processor for the Apache web server in order for the client to request functions to

be performed server-side through the use of the Hypertext Transfer Protocol (HTTP)

[51]. There were several reasons to why the PHP language was chosen for this project:

1. The server has to be able to accept HTTP requests and reply in XML. The PHP5

pre-processor makes it easy to provide an access point to the service, makes HTTP

parameters sent available as global variables, and makes it easy to format proper

XML output.

2. The server has to interact with a database for storage. Database access methods

are built into the PHP5 core by default, and makes this very easy to implement

code that accesses a database for data.

3. The server needs to navigate and modify XML data structures. The PHP5 core

includes the Simplexml functions, which makes it easy to traverse the XML DOM.

In comparison, Java's Saxon package for DOM traversal can be quite cumbersome

to utilize as it requires more code to perform the equivalent operation.

4. The server needs to do substantial textual matching and editing. PHP5 has built-

in support for the use of Perl Regular Expression syntax.

5. PHP is a dynamically and weakly typed language, which removes the need for

explicit casting between types. As comparison, Java is a statically and strongly

typed language which makes it less suited for glueing together the use of other

applications as interpreted scripting languages are (cf. Scripted Components de-

sign pattern) [36]. As the web service needs to perform textual matching and

transformation on the results retrieved by the Isabelle theorem prover, casting to

String type for text character manipulation will be necessary.

6. The PHP scripting language is relatively easy to learn and does not have compli-

cated syntax in comparison to languages such as Java and Haskell. Additionally,

it is well documented with a range of tutorials available in the public domain.

33

Many of the aforementioned functionalities are available in other programming/scripting

languages. The alternative solutions considered were: Java Server Pages, Python and

Perl.

Java Server Pages were ruled out as it uses Java which is not well suited for scripting

as it can require relatively more code to perform simple tasks such as le handling and

XML traversal [36]. Furthermore, it is statically and strongly typed as noted above,

which makes it less suitable for scripting.

Python shares a number of strengths with PHP. They are both dynamically typed,

interpreted languages, and have a simple syntax. Python was ruled out on the basis

of its syntax being more complex in general than PHP, and based on the author's

subjective evaluation that it is not documented as well as the PHP language.

Perl is a dynamically and weakly typed interpreted scripting language similarly to PHP.

It can be used in combination with the Common Gateway Interface (CGI) [33] to provide

HTTP access and thus be used as a web application. However, it is notoriously dicult

to debug, and has a generally poor coding style making it dicult to learn and read

[19].

Thus, based on the analysis above, PHP was selected to be used when developing the

system.

4.2.5 MySQL

The server should store data regarding users, proof scripts and broker and prover in-

stances in persistent storage. For easy access as well as catering for possible future func-

tionality expansion, the system should store this information in a relational database

management system. The database system chosen for this project is MySQL.

MySQL was chosen due to its widespread use and that it is readily supported in PHP5,

as well as the fact that it is open source. A contending option was the use of PostgreSQL

which is similar in terms of being supported in PHP5 as well as being open source. There

are no notable advantages or disadvantages in using one over the other in this project.

MySQL was chosen merely due to its popularity in web application development.

4.3 Conceptual System Design

This section presents a more in-depth description of the conceptual design of the system

outlined in Section 4.1.

34

4.3.1 System Overview of Architecture

As noted in Section 4.1, the conceptual system has a web-based, client/server architec-

ture. Thus, the system as a whole can be divided into client-side and server-side for

further study.

The server in the architecture should be implemented as a web service that provides an

access point for web clients to utilize the server-side functions. The web service should

thus be responsible for interacting with the prover (through the broker) on behalf of

the client.

The web client should be implemented as an AJAX enhanced web-page, accessible from

a web server and run within the user's web browser, that interacts with the web service

in order to provide the user with the functions outlined in Section 4.1.

The general structure of interaction between the client-side and the server-side of the

system is shown in Figures 4.4 and 4.5.

Figure 4.4: Architectural Overview

Figure 4.5: Sequence Diagram

35

Figure 4.6 shows a more concrete example of interaction between the dierent com-

ponents in the overall system. The diagram represents the process of performing an

implication introduction (impI) step on a goal.

Figure 4.6: Example Interaction Sequence

4.3.2 Web Service

The web service should deal with the interaction between the user and the broker (and

Isabelle). User actions on the web client trigger HTTP requests sent to the web service.

The web service reacts accordingly to the requests, triggering actions on the broker,

parsing its results into XML and returning the parsed results to the web client for

display. The web service should be written in PHP5 and run on a web server (e.g.

Apache).

The concept of web services is often used with varying meaning. The W3C denes a

Web service as

...a software system designed to support interoperable machine-to-machine

interaction over a network. It has an interface described in a machine-

processable format (specically WSDL). Other systems interact with the

Web service in a manner prescribed by its description using SOAP messages,

typically conveyed using HTTP with an XML serialization in conjunction

with other Web-related standards. [44]

However, the everyday use of the concept is often somewhat less strict. Is is common to

talk about using web services for AJAX requests, even though these web services do not

use SOAP [44] nor the Web Services Description Language (WSDL) [44]. For AJAX

applications, web services usually merely receive simple HTTP requests and reply with

plain, unencapsulated XML documents. The reason for this is that the use of SOAP

36

encapsulation introduces overhead and requires extra processing client-side which AJAX

designers usually try to minimize. This simplied denition of a web service is the

one that will be used in this project.

The web service in this application should be implemented as a set of PHP les that

act as access points to the underlying services they provide. These services are:

• Open script

• Create script

• Save current script

• Edit current script

• Process script

Core functions for communicating with the broker, the database and parsing broker

responses, should be delegated to separate les/classes for re-use, maintainability and

to facilitate testing.

4.3.3 Web Client

The web client application should act as the user interface in which the user can perform

theorem proving. The user utilizes the functions oered by the web service indirectly

through the web client.

The web client should be implemented as a single web page - enhanced with Javascript

code to provide AJAX functionality - that passes requests to the web service. Upon

receiving XML updates from the web service (asynchronously), the Javascript in the

web page will control how the web browser renders the results. The dynamic DOM

modication and the creation of AJAX queries by the Javascript will utilize the jQuery

Javascript library (see Subsection 4.2.3 for further explanation) in order to simplify

coding and to easily apply visual enhancements to the web page rendering.

It should be the responsibility of the Javascript in the web client to build the proof

hierarchy for rendering, as Isabelle output does not contain any hierarchical information.

Thus, it is necessary for the web client to keep track of previous results in order to build

the graphical representation. Furthermore, it needs to dynamically transform XML

results into HTML for insertion into the DOM. The Javascript also needs to interpret

user actions in order to generate HTTP requests to be sent to the web service.

37

4.3.4 Persistent Storage

The PGKit broker should deal with the actual serialization of the proof scripts. Proof

scripts should be stored as regular Isabelle proof scripts with the .thy extension.

User account details should be stored in the MySQL database. The database should

store information regarding username, home directory, broker instance ID, broker net-

work address, prover instance ID as well as information regarding the proof script at

hand. Since there is relatively little data that is to be stored in the database, it could

be replaced with storing the information in an XML le instead. However, in order

to build in extensibility for unforeseen complications and future expansion, database

access should be used.

In order to prevent the likelihood of misuse of the system, the system will not allow

for uploading of proof scripts as a le from the user. In order to accept uploading

proof scripts, the web service would have to save the le in its own lespace before

sending it to the broker, thus opening up for malicious code to be included in the

proof script which could compromise the system. This risk is reduced if the system

requires that the proof script has to be parsed by the broker before it is allowed to

be stored on the remote service, as the broker will reject non-parseable data and thus

refuse to save the script.Also, since the user is not allowed to directly manipulate the

proof script when using the web application outside of using the set actions provided

by the web application, this should further reduce the likelihood of misuse in terms of

utilizing possible security loopholes in the PGKit broker or in Isabelle itself. However,

it is possible for a user with malicious intent to bypass the web application itself and

interact directly with the PGKit broker. It was deemed that this issue lies outside of the

scope of this project, and is something that should rather be addressed by the PGKit

authors if security loopholes are found. If misuse is a high concern, the system could

be run under a virtual machine or a sandboxed environment to ensure that no harm is

done.

4.4 User Interface

The user interface should be designed in accordance to well established user interface

heuristics [29] and in accordance to the specic user interface requirements outlined in

Section 3.3.

As a brief summary, the user interface should:

• Have a simple and intuitive design.

• Maintain styles consistently.

• Provide clear and easily understandable error messages.

38

• Provide hints to the user as to how to use the available functions.

Although the user interface design was bound to change substantially during the imple-

mentation stage, a set of mock-ups of the envisioned user interface were made in order

to get a rough guide as to what the proof representation should look like. Figures 4.7

and 4.8 show the envisioned rendering of box-proofs that the system should provide.

Figure 4.7: Draft GUI 1

Figure 4.8: Draft GUI 2

Emphasis was put on mimicking pen-and-paper box proof creation, but at the same time

utilizing the advantages of being able to resize the boxes dynamically. Thus, one can

minimize subgoals in order to remove these from view when concentrating on solving a

specic subgoal.

4.5 Design Summary

The conceptual design of the system should act as a guide during the implementation

of the system. However, implementation stages often involve signicant deviations

from the conceptual design, in order to mitigate problems that appear and to more

appropriately address the aims of the project problem at hand.

39

Chapter 5

Implementation

5.1 Introduction

This chapter covers the implementation stage of the project. It outlines issues that

have arisen during the project and what design decisions were taken as to mitigate or

overcome these issues.

The implementation followed the conceptual design specied in Chapter 4 as closely

as possible. The high-level conceptual architecture was, to a great extent, adhered to.

However, there were unforeseen complications that resulted in the need to change some

of the details of the system design.

5.2 Web Service

The implementation of the server-side part of the system was more complicated than

initially foreseen.

Notably, it became clear during implementation that the PGKit broker had a number

of severe shortcomings that aected the way the system would communicate and use

its functions. These limitations were not evident beforehand as they were not explicitly

documented nor did they become clear upon reading the documentation regarding the

framework.

There were also complications with the Isabelle theorem prover itself (in terms of needing

to process and transform its output into a suitable format) and issues with the PHP5

pre-processor that aected the way the system was implemented.

In addition to the unforeseen complications due to problems with the dependencies,

there were several design decisions that had to be taken in terms of how one would

most appropriately provide the required functions as per the requirements. All these

issues will be discussed in the next subsections.

40

5.2.1 PGKit Issues

There were severe complications with using the PGKit [4, 5, 45, 46] broker for this

project. The following is a discussion of the problems experienced, and the decisions

taken to mitigate these.

5.2.1.1 Available version non-working

Both the binary version of the PGKit broker available on the PGKit website [45], and

the version in the PGKit's CVS repository, did not work at the time the project work

started. Although the broker application would start to run, it would fail to parse

the majority of valid Isabelle proof scripts given to it. Additionally, for the few proof

scripts it managed to parse, it would not allow step-by-step, incremental processing of

the script. i.e. it would only allow to parse the whole script at once.

Time was spent on trying to understand what and where the problems lay, in terms

of inspecting log les, trying dierent CVS versions of Isabelle and recompiling the

PGKit broker code from the source code without any success. The authors of the

PGKit broker, David Aspinall and Christoph Lüth, were contacted directly to enquire

about the problems experienced as no success had been achieved in trying to run the

application. The reply received from Christoph Lüth indicated that the PGKit broker's

code itself was currently in a non-working stage. However, he immediately performed

changes to the code so that the PGKit broker again was in a runnable state, and

uploaded the working binary version to the PGKit broker's website. A short extract of

Christoph Lüth's reply is shown below.

I xed it up a bit (it has suered some bit-rot over the last few months or

so).

It now works with the latest development version of Isabelle again, or at

least does not fall over quickly[...]You can parse les (both of the examples

you provided), and you can (slowly, cautiously) step through les

It was decided that xing the the PGKit problems ourselves was outside of the scope

of the project, and would likely take a fair amount of time more appropriately spent

elsewhere.

5.2.1.2 Instability

Christoph Lüth's modications to the PGKit broker's code made the application runnable.

However, it remains unstable. The problems encountered with its stability were:

41

1. The PGKit broker suers from timeout errors when creating Isabelle theorem

prover instances. As a result, the PGKit broker might have to be restarted several

times until it manages to handshake properly with the Isabelle theorem prover.

2. The PGKit broker does not ush the script buer properly. According to the

documentation, it should be able to discard the current le at any stage and start

working on another. However, in practice this does not seem to work. In order

to close a le properly and open a new one, it was necessary to change the state

of all the objects in the buer from processed to parsed before using the action

to discard an open le. If the le was discarded without ensuring that all objects

were reverted to the parsed state, it would not properly ush the object buer

upon loading a new le. This caused object-processing requests in the newly

opened le to fail as it was unable to process the rst object found in the buer.

This was due to the object being a remnant from the previous le which had not

been ushed out properly. When such cases occurred the PGKit broker would

deadlock. The only way to get out of this deadlock was to terminate and restart

the broker. In order to overcome this shortcoming, the web service has to keep

track of what le was last opened, and what object ID the rst line of the le has

in order to retract to this stage before dismissing the le.

3. The PGKit broker suers from intermittent unexplained internal errors (i.e. no

error log given nor any other details about what caused the error), which makes

the system deadlock, hereby forcing a restart.

4. The PGKit broker does not recover gracefully from invalid process requests. These

situations can be problematic as messages might get lost over the network between

the web client and the web server (or PGKit broker), resulting in the client not

having the correct updated view of the proof script. If the client requests an action

to be performed on a removed object, the PGKit broker will deadlock.

5. The PGKit broker is prone to errors when requesting editing on a range of objects,

such as when users wish to undo a set of rule applications. If new objects have

been inserted into the range after initial loading of the le, as is often the case, and

an edit request is put on the range of objects, the PGKit broker intermittently

fails to properly determine what objects are in the range. In such cases, the

PGKit broker ends up with proceeding to overwrite all objects from the start

range specied all the way down to the end of the proof script rather than the

specied stop position.

5.2.1.3 Few PGIP commands implemented

Another setback during implementation was that few of the commands outlined in the

PGIP specication document were actually implemented in the PGKit broker. The

42

PGKit documentation is quite sparse, and there does not exist a list of what function-

ality is currently oered and what is not. As a result, some actions had to be performed

in a dierent manner in order to achieve similar results to what the missing commands

were supposed to do.

One of the non-implemented PGIP commands was one which is meant to return an

updated list of the proof script objects. As it is, there is no way for a client to ask for

a list of the current proof script after receiving the list when loading the le. To work

around this problem, the web client has to maintain and update a local buer of the

script, which might get out of synchronization if messages are lost.

Yet another problem was that some of the commands that changed Isabelle system

preferences were not implemented. i.e. it was not possible to enable the PGML-symbols

output preference on Isabelle through the PGKit broker. PGML [7, 46, 5] is a very small

XML language, contained in the PGIP specication, that encloses X-Symbols with XML

tags. In order to overcome this, the local copy of the PGKit broker source code was

modied to enable this function by default and recompiled for use in this project. Note

that PGML output does not make the whole state display result into proper XML

structure as it only wraps XML tags around X-symbols. The rest of the state display

results are still textual and needs to be parsed/transformed by the web service into a

easily processable format. An example of a PGML marked-up X-Symbol is:

<symbol name="exists">\<exists ></symbol >

5.2.1.4 Complexity of PGIP protocol

The complexity of the PGIP protocol, and the sheer size of the XML data being returned

after each PGIP command, made the protocol unsuitable for use by the web client itself,

as it would require extensive in-browser processing and introduce high overhead in terms

of the amount of data having to be sent over the network.

In order to reduce the amount of processing and size of messages being sent to and

from the web client, it was decided that the web server would transform PGIP replies

from the PGKit broker into a much simplied XML language more suited for the web

application's use. An example of the XML result returned to the web client after

creating the line apply (rule_tac [1] conjI) in the proof script is:

1 <?xml version="1.0"?>

2 <body>

3 <remobj id="a275"/>

4 <object id="a277" state="parsed" position="a26d">apply (

rule_tac [1] conjI)</object >

5 <object id="a278" state="parsed" position="a26d">

6 <empty/>

43

7 </object >

8 </body>

In comparison, the original PGIP message this XML reply was generated from contained

31 XML tags (see Appendix A.1 for a printout of the PGIP message), excluding an XML

document denition element and containing body element tags. The condensed reply

created by the web service, including an XML document denition element tag and

containing body element tags, contains in total only 7 XML tag elements.

In order to reduce the data being sent, information regarding the prover, broker and

display (i.e. the web service) components were taken out as were PGIP envelope tags

as they were of no use for the web client. Additionally, information regarding internal

sequence numbers, timestamps and other superuous XML elements were also taken

stripped away in the transformation to reduce the message size.

Furthermore, as the overall aim of the simplied XML language was to reduce the size

of the XML data sent, XML namespace declarations and Document Type Denition-

s/XML Schema declarations were not used in the simplied XML protocol.

5.2.1.5 PGIP missing a remove object command

Surprisingly there is no mention in the PGIP specications of any command to remove

an object from a proof script. This is an essential function that is needed in order to

allow users to retract/remove applied proof steps. In comparison, there are commands

for adding and editing objects.

To work around this limitation, our web service has to edit the requested removed

object to contain an empty String. Although this does not physically remove the object

from the PGKit broker's buer, it will not show up in the user display nor in the script

when the broker stores the script persistently. Another solution would be to edit the

broker so as to provide this functionality. However, this would introduce a function

not declared in the PGIP protocol and would ruin the whole point of using a formally

specied protocol.

5.2.2 Isabelle Issues

There were also complications with the Isabelle theorem prover itself, in terms of having

to process and transform Isabelle output into a suitable format for program processing

rather than for human reading.

5.2.2.1 Missing XML output feature

It was initially taken for granted that the Isabelle theorem prover provided an option to

make it return state displays (proof results) marked-up in XML syntax for processing,

44

rather than the usual textual representation of state displays returned meant for human

reading rather than application processing. This was based on reading the White Paper

on the Proof General Kit [4] that showed a state display marked-up in XML, and on

browsing of the Isabelle source code where a documented output_XML function was

found.

However, upon contacting David Aspinall and Christoph Lüth about having diculty

enabling the XML output, we were informed that this functionality had been taken out

of Isabelle a while ago. As to the explanation why it had been taken out, Aspinall said

that it "...hasn't received much attention as it makes less sense in the Isar interaction

mode." . He did, however, e-mail a snippet of code from an old experimental version

that had the functionality enabled. However, after spending some time trying to get

the code to work, the attempt was abandoned. The decision to abandon the attempt

to make Isabelle output XML marked-up results was taken due to:

1. The Isabelle code has changed signicantly since the function was taken out, thus

it was not possible to merely cut-and-paste the code in.

2. The author of this project does not have any previous experience with the ML

programming language, thus it would be necessary to set aside time to learn the

basics of ML in order to implement the function .

The decision was taken that this diversion was not worthwhile spending time on, and

we opted instead to do text matching and generating XML at the web server.

An example of the generated XML mark-up of the results, by the web service, is shown

below. Appendix A.2 contains a comparison between the raw PGIP message containing

the raw Isabelle state representation vs. a corresponding XML message generated by

the web service .


2 <body>

3 <result id="a7a">

4 <tree step="0" subgoals="1">

5 <subgoal id="1">

6 <given >

7 <atom kind="free">P</atom>

8 </given >

9 <given >

10 <bracket >

11 <atom kind="free">P</atom>

12 <symbol name="longrightarrow">longrightarrow </

symbol >

13 <atom kind="free">Q</atom>

14 </bracket >

15 </given >

45

16 <goal>

17 <atom kind="free">Q</atom>

18 </goal>

19 </subgoal >

20 </tree>

21 </result >

22 </body>

5.2.2.2 PGIP communication

The PGKit broker is not responsible for all the PGIP communication problems that

have been experienced. The CVS version of the Isabelle theorem prover used with the

system (dated 4. July 2007, which was used based upon the recommendation of David

Aspinall, as it was compatible with the PGKit broker), reports an error in the PGIP

wrapper code upon starting up a prover instance. As a result, the initial line in the

proof script, that of the theory le declaration, has to be processed twice. This step

must be performed at any time the system backtracks to the rst line of the script.

It has not been possible to nd a CVS version of Isabelle that is compatible with the

PGKit and at the same time does not have this error.

5.2.3 PHP issues

There were a few implementation issues relating to using the PHP pre-processor, which

are discussed below.

5.2.3.1 Simplexml and mixed content nodes

During the implementation, there were problems with how PHP's Simplexml function-

ality deals with nodes with mixed content. The XML language was initially created for

marking up text documents as mixed content [50]. However, as the Simplexml func-

tionality currently works, one can get either the text content of a node or the sub-nodes

of a node, but not retrieve both at the same time. Additionally, Simplexml would not

dump the content of the node containing mixed content for manual processing, but

would only allow dumping the whole tree from the document root, disallowing the use

for Simplexml to navigate the XML tree. As a result, stripping away XML element tags

to get the full content of a node (with mixed content) had to be done by the use of

Regular Expressions rather than simply traversing the XML-tree with PHP's Simplexml

function.

46

5.2.3.2 Security mode restrictions

The DICE system at the School of Informatics provides a web server that sta and stu-

dents can use to host web pages. The web server has the PHP pre-processor enabled for

use, thus it would seem that it would be possible to deploy the system there. However,

it became clear that the web server runs PHP5 in safe mode. Running in safe mode

causes le access and socket connection operations to be severely limited. This means

that the PHP scripts cannot access les outside the web directory directory, and more

importantly, cannot open socket connections to the PGKit broker making the PHP

scripts unable to interact with it.

It was decided that the system had to be developed and tested on a separate system

outside of the DICE system.

5.3 Web Client

The implementation of the web client imposed a fair share of challenges that had to be

properly addressed. which will now be discussed.

5.3.1 Creating and Displaying the proof hierarchy

The implementation of the functions responsible for creating and displaying proof hier-

archies involved much deliberation, as it was not straightforward to implement them.

The problems lay fundamentally with how the Isabelle system works.

5.3.1.1 Isabelle does not recall proof history.

Isabelle treats each new proof state as a new separate subgoal to prove, from the view

of the client, without any previous history of the whole proof so far. Thus there is no

hierarchical information of the proof structure available for use by the system. In order

to overcome this problem, a design decision was taken that the Javascript in the client's

web browser would build up the proof hierarchy on its own as it receives results. It does

this by keeping track of the current position in the proof script and previously rendered

objects in order to insert the new results at the appropriate place in the DOM. This

requires substantial processing by the Javascript interpreter, but is unavoidable as long

as Isabelle does not have the ability to display the full structure of a proof such as what

is available in Coq [31].

5.3.1.2 Isabelle's subgoal numbering

Isabelle's process of numbering subgoals is dicult to track as the labelled number does

not remain consistent. This makes it very dicult to keep track of the subgoals when

47

creating the proof hierarchy. This is more appropriately illustrated with an example:

Say we have two subgoals, call the rst A, the second B.. At the start,

subgoal A has the numeric label 1, and B the numeric label 2. If a backward

rule is applied to subgoal A that introduces two new subgoals (let us call

them subgoal A.1 and A.2), these are now numbered with labels 1 and 2

respectively, and the initial subgoal B now has the numeric label 3.

In order to overcome the problems associated with this, the web applications keeps an

updated count of the respective subgoal numbers when creating the proof hierarchy. It

increments subgoal numbers for all higher existing subgoals in the DOM when a new

subgoal is inserted, and likewise decrements subgoal number of all higher numbered

subgoals when a subgoal is closed.

5.3.1.3 Repeated assumption listings

In relation to the point made that Isabelle does not recall proof history, the system

instead repeats all the assumptions for each new subgoal to prove. This causes problems

when creating the proof hierarchy in the web application, as box-proof notation is

intended to remove the need for re-listing assumptions and overwhelming the user with

superuous data. It was clear that the web application should prevent the re-listing of

assumptions where the assumptions already exist in the parent states of a proof branch.

This was implemented by making the web client (the Javascript code) check the proof

hierarchy so far so as not to introduce already existing assumptions.

5.3.1.4 Converting from XML to HTML

Before being able to update the DOM with new information generated by a proof step,

the data received in the XML had to be processed and its results converted into HTML.

Although having the web service return HTML instead of XML would remove the need

to convert at the client side, it was decided that it would make more sense that XML

was sent in order to separate display information (HTML) from processing information

(XML), and to facilitate interoperability. This solution would thus allow changes to how

results are rendered to be done at the client side, and allow other client applications to

utilize the web service.

For these reasons, a client-side function was created that transforms XML directly

into HTML, maintaining attribute data (including sub-nodes and text). The function

allowed specication as to what type of HTML tags it was to generate (i.e. DIV,

SPAN, etc.). This utility function was used for dynamically loading parts of the web

service response (e.g. expressions) for the elements where client-side processing was not

necessary.

48

5.3.1.5 Javascript timeout

During the testing of the system, a problem came to attention relating to opening

medium-to-large proof scripts such as script containing 5+ theorems with somewhat

long proofs.

When one opens an existing proof script in the web client, the whole proof script is

immediately processed and rendered for view in order to facilitate proof-by-pointing.

However, if the proof script contains numerous complicated proofs, the Javascript will

take some time in calling the web server, retrieving and processing the results, before

eventually creating the proof hierarchy in the DOM. Most web browsers put a time

limit to how long a piece of Javascript code can run before it it is suspected to have

deadlocked and warns the user (usually around 10-15 seconds).

If the proof script to be processed is of substantial size, Isabelle might take a while to

process the whole script. Additionally, the Javascript code needs to render the results

which can also take a while. Thus, the Javascript for our system can run longer than

the browser's timeout limit. In such cases the user will usually have to reply to a dialog

asking if they want the Javascript execution to continue, which is a nuisance to the user.

However, this problem is not avoidable as long as the system renders the whole proof at

initial loading. Dividing the processing of the script into chunks of 5-10 objects would

not help in this situation either, as the underlying calling Javascript function still would

take the same time (or most likely longer time than the original solution).

Note that performing proof steps, undoing proof steps and adding new theorems to

prove to an already opened proof script, should not lead to this issue as none of these

steps involve re-rendering the whole script.

5.3.2 Point-and-click Proof Creation

The issues experienced with implementing the point-and-click proof creation will now

be discussed.

5.3.2.1 Available proof rules

The choice of natural deduction rules to provide access to and in what way they are

allowed to be applied (forward, backward or both) was a design decision that required

signicant attention. The set of rules chosen, and the way they are allowed to be

applied was decided based on a survey of introductory logic material [10, 13, 27]. All

the standard rules of natural deduction are present. However, there are limitation

to how the dierent rules are allowed to be applied. This was imposed in order to

not confuse novice users, as many of these rules are complicated and unnatural to use

(though still theoretically possible) in certain ways.

49

The following proof rules are available to the users of the system (see Appendix B for

denitions of these rules):

Backwards ∧ε1, ∧ε2, ∧i, ∀ε, ∃ε, ∨ε, = ε1, = ε2, ¬ε, → ε

Forwards ∧i, → i, ∃i, ∀i, = i, ¬ε, ¬i, ∨i1, ∨ι2, classical

Additional rules PBC (shown in Figure 5.1), LEM (shown in Figure 5.2 )

[35]

Figure 5.1: Proof by Contradiction

[35]

Figure 5.2: Law of Excluded Middle

Some expert users might feel that limiting the number of rules and what way the rules

are allowed to be applied can reduce some of the system's usefulness. Expert users often

rely on using more advanced rules that do not appear as basic natural deduction rules in

logic textbooks, in order to reduce the number of proof steps needed in a formal proof.

However, these rules will in most cases only confuse novice users, as well as making

it more dicult to jump between dierent interactive theorem provers at a later stage

(we cannot expect non-generic rules to be readily available in all provers). It was felt

that a trade-o had to be made in order to make it useful for complete novices without

confusing them and at the same time be of use to more experienced users.

Note that there are 2 so-called additional rules that are available for simplifying proofs;

Proof by Contradiction (PBC) and Law of Excluded Middle (LEM). These were added due

to their widespread use in natural deduction proofs.

5.3.2.2 Proof dependencies and reuse

Ideally, we would like to have the ability to use proved theorems in other proofs in

order to reduce the proof steps needed. However, due to the complication of converting

theorem declarations in Isabelle from meta-level to object level, this was omitted. If

all theorems were forced to be declared at the object level, then re-use would not be

50

problematic, as the cut_tac command in Isabelle (that enables proved theorems to be

asserted as new assumptions into a goal being proven), is already used for the LEM rule

application which is declared at the object level.

5.3.2.3 Isabelle's lack of labelling

Isabelle does not allow any labelling of expressions in the proof script. This is un-

fortunate, as the box-style notation relies on the use of labels for human readability

(see Subsection 2.4.1). It was deemed that labelling was critical in order to maintain

the benets sought by using box-proof notation, thus a solution had to be found that

provided labelling. The solution devised was that the Javascript run at the client has

to create unique labels for assumption expressions so as to make it easier for users to

follow the box proof. However, this was not deemed necessary for goal expressions, as

there would only be one single goal to prove at the dierent levels in the proof, thus

this would not inict any confusion as to what goal a step is working on.

Although this solution solved the problem of labelling expressions, it does not success-

fully address how to properly provide explanations as to what assumptions a proof rule

worked on to arrive at a new assumption (in case of forward steps) or goal (in terms

of backward steps). e.g. using¬ε forward on assumptions labelled 1 and 3 in order to

arrive at the new assumption labelled 4 would be explained as ¬ε(2, 3) in box-proof

notation. The problem is that since labelling is not used in the proof script itself, the

system has no way to link the variable instantiations in the Isabelle proof rules to the

existing assumptions. In fact, instantiating variables in Isabelle proof rules can only be

done by explicitly stating the binding rather than referring to some label. i.e. The com-

mand for applying the ¬εrule forwards on the assumptions P and ¬P in order to arrive

at False would require the command apply (frule_tac [1] P="P"and R="False"in notE).

Figure 5.3: Applying forward rule

One way to utilize proof rule explanations is to capture information as to the labels of

51

the selected assumptions in the web client when a user applies a rule (shown in Figure

5.3). This information can then be added as an explanation to what lines the proof rule

worked on. However, due to the aforementioned lack of a labelling mechanism. this

information is not incorporated into the proof itself as it is not Isabelle-relevant, and

so would be lost when re-loading the le. This would also aect the user any time an

existing proof script was opened as it would not contain proper explanations other than

the name of the rule applied .

Two solutions were identied:

1. Add Isar text comments to the proof script at the time of the rule application,

indicating the labels of the assumptions used with the rule to be extracted at a

later time.

2. Create a data-structure within the Isabelle proof script that would contain a list

of expressions with assigned labels. This data-structure would then be used to

annotate unlabelled expressions by iteration and comparing content, before being

added to the proof hierarchy at the web client.

None of the above solutions are ideal. The rst solution is the easiest to implement.

However, it does not utilize the explanations added other than displaying its text content

in the DOM. In the case where the web application's labelling system is changed, the

labels in the explanation might no longer refer to the right expressions. The benet of

this solution, however, is that it is light-weight and does not involve much processing or

introduce much overhead to the messages being sent to the server.The second solution

is heavy-handed and requires extensive processing on both the client and server side to

ensure correctness and to utilize.

Upon weighing up the strengths and weaknesses of the two solutions, it was decided

that the use of lightweight comments would be sucient, as it would take less time to

implement, faster for the system to perform and would not make any dierence in terms

of the information being expressed in the explanation for the user.

As a side-note, a theorem prover such as Coq, which supports the use of labelling, would

not have incurred this problem.

5.3.2.4 Replaying proof

Related to the issue of labelling is that of replaying proofs upon opening a theorem le.

Upon loading a le, the web clients requests the whole script to be parsed in order to

create the proof hierarchies necessary in able to perform point-and-click proofs. The

proof hierarchy is created by traversing the returned proof results line-by-line, and

reacting on them accordingly (rules as to what will be added to the DOM). Without

52

having the labels committed to the proof script itself, the client would not be able to

determine what each rule application's side explanation was in terms of assumptions

used as premises.

To overcome this, the system was implemented so that if a comment is found following

the proof step in the script, then it uses that line as explanation (similarly to what it

would do when performing point-and-click rule applications). If there is no comment

line, then it falls back to only giving the proof rule name used. This way the system

works for scripts that do not have comments as well, falling back to somewhat less

informative explanations. An example of a commented line is

apply (frule_tac [1] P="P"and R="False"in notE)(*¬E (3,1)*).

5.3.2.5 Instantiation of variables in proof rules

During the implementation, a decision had to be taken as to what extent the system

should instantiate variables in proof rules (this issue is linked to the discussion of la-

belling in the previous section). Usually, in the Isabelle community (and most other

theorem prover communities), it is considered bad style to provide more than necessary

explicit instantiations of variables. This is due to the potential eects a change in a

proof step can have on proceeding proof steps. Keeping the variable instantiation to a

minimum, the Isabelle theorem prover can instantiate these variables internally, search-

ing for variables that satisfy the rule conditions. Again, if Isabelle utilized labelling of

expressions, this would not be as much of an issue.

In our system, the decision was taken to instantiate as much as possible in terms of the

rule's assumptions. It would not be necessary to specify the goal variable of the proof

rule as using appropriate Isabelle tactics would specify what subgoal to work on (and

thus ensure that only one goal is ever in question).

Further complications to variable instantiations included the need to useλ-expressions

on the variable instantiation on the ∃εand ∀i rules in the case that the expressions con-

tained multiple quantiers. i.e. in order to apply the ∃εrule forward on the ∃y∀x. P (x y)assumption, the Isabelle command would be

apply (rule_tac [1] P="\<lambda> y . \<forall> x . (P x y)"in exE).

5.3.2.6 Closed subgoals

Isabelle does not notify users when a branch (a subgoal) of a proof is closed. The

only indication that a subgoal has been closed is if one counts the number of subgoals

in the current proof and compare it to the number of subgoals in the previous proof

state in order to check if there is a decrease in count. As a result, the functions in the

Javascript code responsible for creating the proof hierarchy has to track subgoal counts

to determine branch closures.

53

5.3.2.7 Closing subgoals

Although Isabelle allows working on subgoals in any order, it does not allow users to

readily close subgoals non-linearly (more specically, applying an assumption). It does

not have a command that enables a one-step closure of a subgoal lower down on the

stack. In order to remedy this, the system needs to introduce a lemma that mimics

closure of a proof branch that can be applied as a regular rule that allows it to be

targeted at specic subgoals. The disadvantage with this solution is that the introduced

lemma has to be added to every proof script that is to be used in the system.

Another issue regarding closing subgoals is that some rule application commands in Is-

abelle are destructive (notably erule and drule) in that they can delete assumptions that

are used as well as close a subgoal implicitly. In order to maintain consistency, the sys-

tem will only permit the use of Isabelle's backward and forward commands (exceptions

to this is the use of the assumption lemma which requires a destructive command, and

the cut_tac command used in the LEM rule application). By removing these commands,

and attaching assumption applications to the allowed commands where needed, there

should not be any cases where subgoals are closed without the user specically closing.

5.3.2.8 Undoing a proof step

The issue of undoing/removing proof steps also required consideration. The problem

at hand was answering the question as to how far in the proof script an undo function

should revert to. Due to the linear nature of proof scripts, rule applications are added

one after the other to the script, disregarding what subgoals they are applied to.

Removing a range of objects from a given position to the end of the theorem proof

is problematic in that it can remove proof steps that are not related to the subgoal

branch that the user wants to retract actions from. i.e. there are dependencies in the

underlying data-structure that do not relate to the proof representation itself.

Deleting just single steps at a time is problematic as well, as the proceeding steps in

the script will be incorrect (i.e. subgoal numbering incorrect etc).

It was determined that the best solution would be to delete from the position of the

object sought to be undone, all the way to the end line of the theorem proof. However,

this side eect might not make much sense for the user (i.e. the user might be confused

as to why removing proof steps in one branch of a proof might remove steps applied to

another unrelated branch of the proof.)

5.3.2.9 Web browser compatibility

Dealing with browser compatibility is a complicated issue. Dierent web browsers im-

plement the W3C Web standards (grouped under the W3C Document Object Model

54

Architecture heading) [53] dierently, and to a varying extent how much of the speci-

cations of the dierent standards they follow. Below is a list of 4 dierent web browsers

(all with diering web layout engines), and how well the system works with them re-

spectively.

1. Mozilla Firefox 1.5 and 2.0 - All the functionality of the system is available, and

the web page is rendered correctly as envisioned. The system was designed with

this browser in consideration.

2. Apple Safari 3.0 - Most of the functionality is available and working. The only aw

when using the web application in this browser is that the tool-tip hints that are

meant to help the user do not seem to appear. This is likely due to a documented

problem with the Safari browser in regards to how it deals with CSS overlays in

terms of z-index levels. Apart from this relatively minute issue, the rest of the

web page is functional and renders correctly.

3. Opera 9.0 - Most of the functionality is available. However, there are issues

regarding how the browser deals with the overow CSS property. As a result,

there are problems with the rendering regarding HTML tag content overowing

past its box and overlapping content in sibling tag boxes. This is especially evident

when regarding the buttons on the menu, and that of rounded corners cropping

its text content.

4. Microsoft Internet Explorer 7.0 - As it is, the system is not usable in this web

browser. There are several reasons to this, mentioned in the list below:

(a) W3C's CSS standards [48] are not strictly followed. Several CSS properties

are not supported, and relies of the use of non-standardized CSS properties

specic to the Internet Explorer web browser.

(b) Non-standard XML processing. Internet Explorer's XML parser works some-

what dierently from the other browsers' XML parsers in terms of how it

interprets text nodes and node attributes.

(c) Plug-in incompatibility. Although the jQuery Javascript library is compat-

ible with Internet Explorer, several of its plug-in libraries are not. This is

mainly due to the problems relating to CSS and XML processing, as men-

tioned above.

(d) Lack of a proper tools to debug code and navigate the DOM for inspection,

like the Firebug plug-in available for Mozilla Firefox.

Attempts were made to remedy the problems associated with the Internet Explorer web

browser. However, based on the diculty with debugging the browser's Javascript envi-

ronment and inspecting the DOM, it was decided that the time needed to be allocated

55

to this was not worth wile based on the limited time span of the project. Additionally,

problems with the Internet Explorer browser in terms of AJAX web applications seem

to occur frequently (i.e. ProofWeb does not support it properly either) [31].

5.3.3 Message Ordering

There were complications involved with controlling what order messages are sent and

processed in the web application. Ideally, we would like the message passing at the

client side to be asynchronous so as to keep the system from locking. However, the

procedural stateful communication required by the broker comes as a contrast to the

stateless asynchronicity aimed at for the web client.

A solution had to be devised that would allow ordering of messages where needed,

without forcing the whole system to block as with pure synchronous messaging.

5.3.3.1 Initial solution

The initial solution was to create recursive functions that would deconstruct a supplied

array containing AJAX call requests. Upon reaching an array containing a single ele-

ment, the AJAX call contained would be called, and the results returned to the parent

recursive call. The parent recursive call would then trigger its AJAX call, and pass the

results so far upwards.. This solution was both process intensive and cumbersome, and

degraded the responsiveness of the system.

5.3.3.2 Revision 2: callback functions

The next improvement was to use callback functions. The jQuery Javascript library

allows AJAX calls to have associated callbacks that are executed upon completing

and receiving a reply to a query. This allows nesting of AJAX calls in the callback

functions to ensure that calls were executed and processed in a certain order. However,

this solution did not work well for the situations where you make calls procedurally in

loops, as there would be no guarantee that a query executed in the rst iteration would

complete processing its callback function before the one for the second iteration.

5.3.3.3 Revision 3: ajaxQuery plug-in

Relatively late in the implementation stage (08. August 2007), the source code to a

new jQuery plug-in called ajaxQueue (written by the jQuery library's main author)

was released on the jQuery message board. This allowed for queueing up AJAX calls,

thus making sure that one AJAX call function (and callback function) would complete

before the next one in the queue was executed. The plug-in managed to enforce this

56

ordering without web browser blocking as with synchronous messaging. The queueing

could be bypassed, if required, by regular AJAX calls.

This plug-in solved several of the issues experienced with message ordering. It was

unfortunate that this plug-in did not appear until late in the project, when much time

had been spent on solving the message ordering. The web client code was changed to

utilize ajaxQueue, which resulted in reasonable improvements to processing time.

5.3.4 User Interface

The implementation of the user interface aimed to keep the design simple so as not to

confuse the users. The user interface underwent rapid and incremental changes in order

to improve its usability, based on weekly discussions with the project supervisors. A

screenshot of the system is shown in Figure 5.4 for reference in the following discussions.

Figure 5.4: Guide to user interface

5.3.4.1 Expanding proofs

Web pages naturally expand vertically rather than horizontally. Thus it is dicult to

cater for horizontal expansion as box-proofs might introduce, especially as HTML DIV

elements need to have specied widths set for most web browser to render them properly.

As the content of the DIV grows, the DIV itself will not normally expand to cater for

this in width as the width is already set. It is possible to grow a DIV element with

the overow CSS property set to auto, which will override the set width if necessary to

cater for growing content. However, the Mozilla Firefox web browser does not deal with

this CSS property gracefully, inicting a 2-3 second browser refresh delay each time a

DIV width needs to be dynamically changed due to overow. It is a noticeable lag that

degrades the feeling of responsiveness.

57

As a result, the amount the box-proof can expand horizontally is limited. In the case of

a large number of nested boxes in the proof, the web client has a function that allows

the user to switch rendering styles so that sibling boxes are introduced vertically rather

than horizontally.

5.3.4.2 Colour scheme

The colour scheme underwent several changes. The initial test versions of the web client

demonstrated used somewhat dark colours, which made the system dicult to focus on.

As a result of the discussions, the colour scheme was revised to use lighter shades and

complementary colours.

Furthermore, the user interface was tested for appropriateness for users suering from

colour blindness through the use of Vischeck [17], a vision simulator tool. The system

was checked against deuteranope, protanope and tritanope vision, which are the most

commonly occurring colour deciencies. The result of the test was that the system was

deemed to be fully usable for users suering from these colour deciencies.

5.3.4.3 Help clues

The system provides hints to the user as to what the buttons in the user interface do.

In the case of objects that apply proof steps, the underlying natural deduction rule is

displayed in a small box next to the button . Some further hints are given about how

to apply the rule in the system, as indicated by the note shown in Figure 5.5. This

should prevent the need for users to refer to a separate user guide in order to understand

the rule's use.

Figure 5.5: Help clue

5.3.4.4 Box hierarchy

The user interface utilizes drop-shadows on proof boxes so as to give the user a sense of

depth/height visualization in order to emphasize the proof hierarchy. This should aid

the user in getting a feeling for where they are in a proof.

58

5.3.4.5 Menu panel vs. Toolbar

Two dierent layout-styles for the buttons were considered. One was to group clickable

actions into drop-down menus allocated in a horizontal toolbar on top of the screen.

The other one was to group actions into toggleable sub-panels in a movable accordion

menu (menu that expands and contracts upon opening and closing menu headings).

The solution decided upon was to use the accordion menu, shown in Figure 5.6. This

was to mimick the layout used in systems such as Adobe Photoshop, where tool buttons

are visible and easily accessible Additionally, this menu can be moved around on the

screen to suit the preference of the user (by default placed on the left hand side of the

screen). The Photoshop-like menu was further enhanced by providing the ability to

hide away groups of actions that are not currently used (toggleable panels). It was felt

that this design would go better with the layout of the rest of the user interface, and

would be quicker to use than a horizontal toolbars (panels of similar buttons kept open

instead of having to navigate the nested structure each time).

Figure 5.6: Menu panel

5.3.4.6 Layout of proofs

Two dierent layout-styles of the proof-script rendering were considered. One approach

was to separate each theorem proof in the script into separate internal page tabs .

Another solution was to render all proofs in a scrollable window. It was felt that

dividing it into tabs would make the system overly complicated for the user (e.g. more

interface objects to learn and would involving more clicks), and would involve more

Javascript processing client-side which might reduce the responsiveness of the system.

Thus, the decision was to lay out theorem proofs as a scrollable script.

59

5.3.4.7 Viewable proof script

It was decided that it should be possible to view the underlying Isabelle proof script,

if the user so wished. This would benecial for a user that wished to gradually learn

Isabelle proof commands. However, the script is by default hidden from the user, and

the user does not require to have any knowledge of Isabelle to use the system. An

example rendering of the proof script is shown in Figure 5.7.

The box-proof is shown on the right, and the corresponding Isabelle script on the left.

Figure 5.7: Proof script

5.3.4.8 Graphical conrmation of nished proofs

Users should be given conrmation that a proof has been nished. However, this should

be done in an unobtrusive way. The solution taken was to colour completed proofs with

the colour green (as opposed to the grey colour of proofs when not completed). This

was meant to utilize the common human linkage of the colour green to indicate OK

or that something is correct/allowed (e.g. in trac lights).

5.3.4.9 Show/hide proofs and proof branches

The ability to maximize and minimize proof and proof branch boxes should make it

easier to hide away information that is not currently relevant for the user, thus reducing

the information overload that might be inicted. Examples are minimizing completed

proofs and proof branches not currently worked on.

60

Additionally, the ability to maximize and minimize was extended to be used on user

interface dialogs as well, such as the main menu panel and the dialog for viewing the

actual Isabelle proof script text.

5.3.4.10 Meta-variables.

Initial implementations included Isabelle variables quantied at the meta-level in the

rendering of the proof scripts (e.g. when applying ∃ε). However, it was felt that this

was not necessary and would only confuse the users as they are Isabelle specic. Thus,

their display was taken out of the system.

5.3.4.11 Drag-and-drop vs. clickable expressions.

Applying forward proof rules often involve selecting premises (assumptions) that have

appeared in the proof so far. Our initial solution for applying proof rules was to:

1. Make the user press the button of the rule to be applied, resulting in a pop-up

box appearing.

2. The user would click and drag the premises to be used as rule parameters and

drop them in elds appearing in the pop-up box

3. Finally the user would click a button to apply the rule.

However, upon discussion, it was decided that this process was too tedious. The solution

decided upon and implemented instead was to:

1. Make the user click on the premises to be used as rule parameters. The selected

premises would be marked in red (shown in Figure 5.8) with a number appearing

to indicate the order they were selected (i.e. if the second assumption were to be

selected next, it would be labelled 2 in a red box).

2. The user would click the button of the sought rule to apply it.

This solution would reduce the number of clicks necessary in order to apply proof rules,

and also reduce the possibility of misunderstanding how to use the system.

61

Figure 5.8: Selecting assumption

5.3.4.12 Selecting subgoal to work on

In order to select a subgoal to work on, the user has to click the empty area within a

proof box. Earlier versions of the implementation involved only having to hover over

a box in order to select it. However, this was problematic when wishing to apply a

proof rule as navigating to the menu could involve temporarily hovering over another

subgoal, thus inadvertently selecting it. Forcing the user to click the subgoal would not

have this problem.

5.3.4.13 Showing natural deduction rule names

In order to keep in style with hiding away prover syntax, the side explanations of how a

statement was arrived at should use the general rule names of natural deduction rather

than Isabelle rule names. This means that the system has to do transformation on the

generated explanations before inserting it in the DOM for display.

5.3.4.14 Using mathematical symbols

When rendering proof results received from Isabelle (after rst transforming from XML

to HTML) the system should express the X-Symbol instructions with proper logical

characters rather than the text given by the prover (e.g. <span class="symbol"type="

forall">forall</span> should be displayed with the character ∀ in the client's browser).In order to accomplish this, the Javascript code has to replace the content of all symbol

tags with respective HTML character codes for display.

The system also relies on being able to convert character symbols directly into Isabelle's

Isar representation. This is needed when users wish to declare new theorems to prove.

Again, keeping with the style of hiding away prover syntax, the user should be required

to deal with proper logical symbols when entering the theorem expression. In order

62

to simplify entering these symbols, the user should be presented with clickable buttons

that add these symbols to the text area used for declaring a theorem.

5.3.4.15 Right-click undo

The user interface should allow users to undo steps in the proof (for both completed and

uncompleted proofs) by right-clicking on the proof step. A context menu should appear

with the choice to undo this step. In order to not confuse the user, right-click should

be disabled for the rest of the user interface to prevent the default browser right-click

menu to appear.

5.3.4.16 Customizability

The user interface should be dynamic in terms of users being able to customize the

layout by moving, hiding, and toggling dierent dialogs, panels and boxes. This allows

users to arrange the layout of the interface to their own requirements, if they so wish

(i.e. drawing proofs vertically and showing the underlying Isabelle proof script)..

5.4 User-story Walkthrough

This sections runs through a user-story for performing a proof in rst-order logic. The

theorem to be proven by the user is (¬(∀x. P x)) → (∃x. (¬(P x))). The theorem was

chosen for demonstration as it is non-trivial theorem to prove using only natural deduc-

tion rules, thus a good example to show how the system can be of use.

The rst thing that the user sees when accessing the web application, is a login prompt

(see Figure 5.9). The user types in their allocated username, and presses the login button

(note that the system does not prompt for a password, as the creation of a proper user

account system with password authentication lay outside the project's scope).

63

Figure 5.9: Login screen

The user is now presented with an empty desktop (see Figure 5.10). To create a new

proof script, the user presses the new le button in the menu on the left.

Figure 5.10: Empty desktop

64

The user is now presented with a window that asks for a name for the le (see Figure

5.11). The user enters example as the name of the le and clicks the create button.

The .thy le extension will be automatically added to the name if the user omits it.

Figure 5.11: Creating a new le

The user is returned to the desktop. A heading will have appeared with the name of

the le (see Figure 5.12). So far the proof panel is empty, as the le does not contain

any theorems yet.

65

Figure 5.12: Empty le created

The user is now able to add a theorem by navigating to the script heading in the menu

and clicking the add theorem button (see Figure 5.13). A window appears where the

user can enter the theorem to prove and give the theorem a name (see Figure 5.14).

The user enters the theorem's assumptions (if any) in the Assumptions text box and

the goal to prove in the Goal text box. In order to simplify the process of entering

logical symbols, the user can press the corresponding buttons. The logical symbol will

then be inserted into the text box in focus.

66

Figure 5.13: Add theorem button

Figure 5.14: Theorem denition prompt

The result of the user creating the theorem is shown in Figure 5.15. A proof box has

appeared in the blank space to the right. This shows the proof that the user aims to

67

prove.

Figure 5.15: New theorem added to script

The user now selects the proof to work on by clicking in the empty area in the middle

of the proof box. The area turns red to indicate that it has been selected. The user

now applies the → i rule backwards by selecting it in the backwards panel of the main

menu (see Figure 5.16). The resulting state of the proof is shown in Figure 5.17.

68

Figure 5.16: → i button

Figure 5.17: Applying → i backwards

The user continues the proof by selecting the subgoal, navigating to the Isabelle panel,

and applying the PBC rule (see Figure 5.18). The PBC rule is under the Isabelle panel

69

as it is an additional rule (not a basic natural deduction rule). The results are shown

in Figure 5.19.

Figure 5.18: PBC button

Figure 5.19: Applying PBC backwards the 1st time

70

The next step for the user is to apply the ¬ε rule backwards (see Figure 5.20). The

system prompts the user for the expression that is to be contradicted. This is a tricky

step, and involves the user making a decision as to what to choose as the new subgoals

to be proven. Based on the user input the system introduces two new subgoals; the

input expression and the negated version of it (see Figures 5.21 and 5.22).

Figure 5.20: ¬ε button

71

Figure 5.21: Specifying new subgoals ( 1st ¬ε backwards)

Figure 5.22: Applying ¬ε backwards the 1st time

The rst subgoal is easily completed by applying an assumption to it (see Figures 5.23

and 5.25).

72

Figure 5.23: Assumption button

Figure 5.24: Closing a branch with an assumption

Next, the user applies the ∀i rule backwards (see Figure 5.25).

73

Figure 5.25: Applying ∀i backwards

Another application of the PBC rule is needed (Figure 5.26).

Figure 5.26: Applying PBC backwards the 2nd time

The user applies another ¬ε step (see Figures 5.27 and 5.28).

74

Figure 5.27: Specifying new subgoals ( 2nd ¬ε backwards)

Figure 5.28: Applying ¬ε backwards the 2nd time

The rst unclosed subgoal is closed by the application of an assumption. The user then

applies the ∃i rule backwards on the second subgoal (see Figures 5.29 and 5.30). The

75

user is prompted with a request to instantiate the quantied variable, which the user

sets to x. The new subgoal is easily closed by an assumption application. The proof is

now complete, as indicated by the change in colour (see Figure 5.31).

Figure 5.29: Instantiating a quantied variable

76

Figure 5.30: Applying ∃i backwards

Figure 5.31: The nished proof

77

Chapter 6

Evaluation

6.1 Introduction

This chapter covers the evaluation of the system that has been developed. The evalua-

tion should determine if the system has been built correctly (verication) and that the

right system has been built (validation). Verication was done by software testing and

validation by a user test.

6.2 Test Data

The system was tested on two sets of example theories: the development set and evalu-

ation set. The development set was used during the implementation of the system and

the evaluation set used to test the nal implemented system. Each set involved theories

from a varied spectrum of diculty ranging from simple propositional logic proofs to

challenging rst-order predicate logic proofs. The development set should be represen-

tative of the test set, but should not contain the same theories. It is important that

the development and the evaluation sets are dierent so as to be able to verify that the

system should work for most natural deduction proofs, rather than being specically

catered to work for the proofs in the development set.

6.3 Verication

Verication involves checking that the system works correctly (i.e. does not have errors).

The testing methods used for this project were black-box testing and glass-box testing

(also known as white-box testing).

Verication of the system involved the following tests where appropriate:

• Unit testing

78

• Integration testing

• System testing

It is dicult to automate the testing of the user interface itself. One issue is that user

interfaces have a tendency to undergo frequent and extensive changes. As a result the

test cases needs to be updated after each change to follow the new behaviour, which

is tedious to perform. Another issue that is especially true for this project is that it is

dicult to ensure that the underlying system is at the appropriate state for what the

test case expects. It was decided that the user interface would not be tested using an

automated test suit. Rather, it would be veried by performing user-stories based on

expected usage in order to catch discrepancies from expected results.

Testing was performed during each iteration of the implementation in order to catch

potential bugs at an early stage.

6.3.1 Testing Framework and Tools

Testing of the PHP web service initially involved utilizing PHP's built-in logging func-

tionality in order to monitor results during processing. However, it was decided that

since the web service would involve several steps of parsing and transformation of data,

it would be advantageous to utilize an automated testing framework to verify that

output generated from each step was correct. Phpunit [8], a PHP testing framework

similar to the well known JUnit [21] , was utilized. Test cases were created that were

run against the code to determine if the transformations worked correctly. In the cases

where function outputs were incorrect, the testing framework agged a warning and

an explanation as to what the discrepancy between expected and actual results were.

Thus, it was easy to identify what had failed, change the code to x the error and

re-test to verify that the new changes had solved the problem. However, the actual

communication with the PGKit broker was not tested with the framework. This was

due to the PGKit broker being relatively unstable and dicult to restart, thus dicult

to maintain consistency in automated tests. i.e. a test might suddenly fail even though

no system code had been changed. It was determined that time would be better spent

on other parts of the project than automating the communication testing.

It was decided that a testing framework would not be used for development of the

Javascript/AJAX client code. Currently, automatic testing frameworks are not widely

used in Javascript web application development, other than for testing Javascript func-

tion libraries. JSUnit [25] is one available automated testing framework available for

Javascript code. The problem with using testing frameworks for AJAX web pages is

that it involves a lot of work creating unit tests in order to mimic event triggering and

DOM modication. Additionally, it is not suitable for testing systems that rely on us-

ing asynchronous message parsing as testing this would involve holding back the testing

79

framework until messages were returned and callback functions were nished processing.

The loss of a network message could thus ag a false positive error. It was thus felt

that creating complicated test cases would be misguided focus of time as the interface

changed rapidly from day to day in terms of functionality and how the underlying code

worked.

The web application was debugged and tested using the log4javascript logging library

[18], and the Firebug [24] web development package, the latter contains a collection of

tools practical for web application development. Firebug provides methods to inspect

the DOM, monitor live Javascript processing and keep track of incoming and outgo-

ing messages. Firebug also makes it possible to monitor network activity in order to

determine response time.

6.3.2 Unit Testing

Unit testing involves testing system modules and components separately from the whole

system. Testing at the unit stage makes it easier to identify where errors occur com-

pared to only testing at the higher level of the system. Unit testing was done during

the implementation of the system, and involved both glass-box and black-box testing

methods. Furthermore, for the web service, running unit tests were in several cases

automated by the phpunit testing framework.

Unit testing was to a lesser extent performed on the web client due to the close coupling

of the function code to the user interface (i.e. DOM structures had to exist in order

for certain functions to be tested properly). However, where appropriate and allowable,

unit testing was performed.

6.3.3 Integration testing

Integration testing involves testing the integration between dierent components of the

system, and how they interact. Testing of this aspect of the system relied on using the

black-box testing method.

Integration testing was performed by intercepting and monitoring messages sent between

the components of the system, and inspecting generated log les.

6.3.4 System Testing

System testing involves testing the system as a whole to determine how well the sys-

tem matches the specications. System testing relied on using the black-box testing

method. When deviations from expected results were found, the errors were tracked

down and the problematic code modied to remove the discrepancy between expected

80

and actual results. The system was then re-tested to verify that the problem was prop-

erly addressed. The system went through several of these iterations until the system

was deemed to be in a satisfactory state to be tested on users.

6.3.5 Results

In general, the system reached an acceptable level to be used for user testing. It passed

the automated tests done on the PHP code, as well as the manual tests performed.

However, due to the PGKit brokers problems with undoing object ranges and intermit-

tent lock-ups, the system was somewhat unstable (see Subsection 5.2.1). These were

problems that lay outside of the developed system and thus dicult to mitigate or

overcome.

Early in the user test (to be discussed in Section 6.4.1) it became clear that there was an

error with adding new theorems to an open script. The problem identied was that the

system did not properly capture the error that occurred when users entered an invalid

formula (i.e. not WFF). The Javascript function responsible for adding new objects to

the proof script checks web server responses for errors before processing results.

However, in the case mentioned above, the server-side system was not returning proper

failure messages. i.e. error message not being marked up with

<failstep></failstep> tags. Although the process of adding proof commands to scripts

utilize the same functions (both client-side and server-side) as the process of adding

theorems, the latter process does not experience this problem (returns proper error

messages upon invalid syntax or invalid proof step). To overcome this, the client code

responsible for adding a theorem had to stop using the generic AJAX function for

sending proof script editing requests. The AJAX calls were now coded into the function

for creating new theorems as error checking had to be done dierently for this specic

activity.

The system's responsiveness was somewhat dicult to evaluate in terms of getting

meaningful quantitative data. The complicating factor was that since the system could

not be deployed on the DICE system, it had to be run on a system outside of the

University of Edinburgh. Unfortunately, the system it was running on had limited

internal memory and connected to a relatively slow Internet connection. As the system

is highly dependent on the hardware environment and network speed, the hardware

system the implemented system was run on was far from ideal. However, the system

was tested with 4 simultaneous users accessing it through the Internet, with acceptable

results in terms of system responsiveness. Testing the response time on the local network

(rather than over the Internet connection as this was known to be a limitation) gave

response times ranging from 60ms to 1500ms, depending on the size of the proof script

as most of the time delay inicted is due to Isabelle's processing.

81

6.4 Validation

Validation involves checking that the right system has been built and that it satises the

need of the user. The validation process seeked to determine how well the implemented

system addresses user needs and that it reached an acceptable level of usability. In

order to determine this, a user test was performed.

6.4.1 User Test

The user test involved testing the implemented system on a small sample of users

utilizing the system to perform interactive theorem proving. This would make it possible

to evaluate the appropriateness of creating proof scripts using a graphical user interface

with point-and-click functionality. Although the assessment was bound to be relatively

subjective, it should give fair indication to the appropriateness of the implemented

system.

Due to the specialized nature of the study are, it was required that test subjects have

at least some knowledge about symbolic logic and creating logical proofs. This limited

who could participate in the study, and resulted in the group of test subjects being quite

small. Ideally, the system should have been tested on novice students learning logic.

However, as the project took place during the summer months, the supply of ideal test

candidates was limited. Undergraduate informatics students would have been good test

candidates, but they were not present at the university during this time period. As a

result, the system was tested on PhD students and academic sta that have experience

with the use of logic and theorem provers. The test candidates were all expert users

of theorem provers and so were somewhat out of the scope of the intended audience.

However, it was determined that these test candidates were the most appropriate to test

the system on given the circumstances, as it would not be possible to test the system

on users without knowledge of symbolic logic.

6.4.1.1 Questionnaire

A questionnaire was developed to capture the participant's evaluation of the system's

usability, appropriateness, user interface and stability. The questionnaire was divided

into 6 parts.

Part 1 aimed at assessing the experience of the test participant in terms of their expe-

rience with logic and theorem proving. More importantly, it also captured information

about what style of natural deduction notation they found easiest to follow in terms of

purely linear proof versus box-style proof.

Part 2 contained a one-page user introduction to the system (see Appendix C). This

introduced users to the system through a pre-created demonstration le containing an

82

unnished proof. By following the guide, the user learned how to open a le, apply a

proof rule, create a new le and to add a new theorem.

Part 3 listed 7 theorems, ranging from propositional logic to rst-order predicate logic,

and asked the user to perform a proof of 2 or 3 of these using the system.

Part 4 aimed to capture their experiences with performing the proof in terms of potential

errors in the system, the presentation of box-proofs and their feelings about if this system

is any help when performing natural deduction proofs. It also captured information

about if the system would be of any use to themselves.

Part 5 aimed to evaluate the user interface itself. The user interface evaluation relied

on the participant's subjective evaluation rather than measuring user performance, as

it was felt that measuring user performance (i.e. tracking time between user actions,

counting clicks, etc.) would not contribute much to the evaluation of this system. Also,

due to the nature of the system, a cross-comparison study between the use of textual

proof assistants (such as Proof General), and the implemented system was not performed

as it would be dicult to gather meaningful quantitative data from such a study as they

are aimed towards two dierent types of users. Additionally, a cross-comparison study

is time consuming to test, and it is dicult to draw concrete results from).

In order to be able to quantify the subjective evaluation of the user interface in order

to indicate appropriateness, the System Usability Scale (SUS) was used [15]. The SUS

uses a pre-set questionnaire (see Appendix C) containing 10 Likert scale questions

regarding how the participant experienced using the system under consideration. The

score calculation gives a value between 0 and 100 (where 0 is low usability and 100 is

high usability), meant to be used as a relative usability estimate.

Part 6 of the questionnaire aimed at capturing any additional comments and feedback

about the system that the user might wish to contribute with.

6.4.2 Results of User Test

The user test initially involved 6 participants. However, one participant never returned

the questionnaire and was disregarded from the analysis of the study. Additionally, one

questionnaire lacked answers to part 5 (the SUS evaluation).

6.4.2.1 Questionnaire part 1

The general information extracted about the test subjects were that they were rea-

sonable to highly condent with symbolic logic and natural deduction, as well as with

performing pen-and-paper proofs. 60% of the participants knew how to use box-proof

style notation, and 80% had experience with using the Isabelle theorem prover. The

relatively high number of participants that knew box-style notation would be bene-

cial for evaluating the system. However, it is slightly unfortunate that so many of the

83

participants had previous experience with Isabelle, making it dicult to evaluate how

much knowledge one needs of Isabelle (which we would hope to be none).

6.4.2.2 Questionnaire parts 2-4, 6

These were grouped together, as parts 2 and 3 do not ask questions, and questions 4

and 6 received overlapping replies.

All the respondents managed to use the system to perform theorem proving. Addition-

ally, all participants felt that the system made it easier to perform theorem proving

in natural deduction in terms of representing proofs and not needing prover specic

knowledge. Also, there were no issues agged as to how box-proofs were displayed

(i.e. if there were any discrepancies from what they associate with box-proofs). All the

participants felt that the system would be useful for novices.

The benecial features reported by the users were (paraphrased and duplicates re-

moved):

• box-proofs gives a good overview of where you are in the proof and how [the

proof] has progressed

• Easy to use

• No need to know prover syntax

• Helpful hints

• Easy to enter new theorems

• Only web browser needed, no install or compilation required

• Makes it easy to apply rules

• Less confusion due to limit on available rules

• No need for instructions

• Clear and easy to use user interface, much improvement to Proof General

• Point-and-click proofs easier to perform

There were also some negative aspects of the system. Unfortunately, all of the users

experienced problems with undoing steps (see problem noted with the PGKit broker

5.2.1). Users experienced problems with intermittent system crashes (found to be due to

PGKit broker deadlocks) and problems with entering new theorems which also resulted

in corrupting the script le if saved (was corrected early in the user test, noted in 6.3.5).

Other limitations identied by the users were:

84

• Failure messages were not very informative

• Would like to be able to reuse lemmas as number of rules available limit its

usefulness

• Would like shortcuts to functions rather than having to point-and-click each time

• Latency reduced responsiveness

Furthermore, all the participants noted that the system would probably not be of use

for them as they were at a PhD / academic lecturer level, and thus rely on using proof

rules that lie outside of the scope of natural deduction (e.g. induction). They would

usually not prove such non-trivial proofs interactively but rather use automated tactics.

This was expected as the system was intended to users in the novice to mid-experienced

range rather than experts.

6.4.2.3 Questionnaire part 5

SUS scores were determined using the following calculation [15] (paraphrased from the

SUS chapter):

1. Subtract the value 1 from the scale position of all odd-numbered questions.

2. Subtract the scale position value from the value 5 for all even-numbered questions

3. Sum the scores and multiply by 2.5 to get the usability estimate.

The individual results of the questionnaires are shown in Table 6.1.

Participant ID Calculated SUS score

1 85/100

2 75/100

3 82.5/100

4 55/100

Table 6.1: SUS evaluation scores

The average SUS score calculated was 74.3, which is a fairly high usability estimate.

This estimate gives strong indication that the users found the system to be of potential

use.

85

6.4.3 Summary of Results

Although the user evaluation brought into light problems with the stability of the sys-

tem, all the participants managed to use the system to perform proofs. The high av-

erage usability estimate, the questionnaire feedback and the positive feedback received

are taken as indication that the system is of practical use. The questionnaire indicated

that the system does help in visualizing and performing natural deduction proofs and

manages to address the problems outlined in the project description. However, there

are a range of improvements that are recommended to be done (e.g. stability, additional

functionality and more informative error messages).

86

Chapter 7

Discussion

7.1 Introduction

This chapter summarizes the project, including its achievements and limitations, and

puts it in a wider context for review.

7.2 Achievements

The project was successful in achieving its outlined aims and objectives. The system

successfully provides a graphical user interface that allows users to perform point-and-

click creation of proof scripts and visualizes the proofs in a graphical style that is easy

to follow. The system utilizes a well known and sound interactive theorem. In addition,

the system utilizes a web based client/server architecture, allowing the user to use the

system remotely without having to install additional software other than having access

to a web browser. The user test indicated that users found it to be quite usable, given

the high average usability estimate and positive feedback.

Although there exists systems oering similar functionality in terms of visualizing

proofs, point-and-click, and providing interactive theorem proving through a clien-

t/server system, there does not, as far as what this author is aware of, exist any other

systems that provide all these functions simultaneously. Pandora runs as a Java ap-

plet, so is completely run on the client-side. Furthermore, it does not use a standard

interactive theorem prover to verify proofs.

ProofWeb is distributed, responsive, utilizes a proper interactive theorem prover and has

the ability to render proofs. However it does not allow point-and-click proof creation,

and it currently only allows proof rendering to be done using text symbols rather than

using pixel-based graphics (making it slightly more dicult to read). Another advantage

our system has is that it is less coupled to the respective theorem prover as it uses the

PGIP protocol and the PGKit broker which are meant to be non-prover specic. The

87

ProofWeb system allows the use of a limited number of other interactive provers other

than Coq (e.g. Isabelle). However, few of the functionalities oered by ProofWeb

when using Coq are available with the others. Additionally, ProofWeb does not provide

a common interface to interact with theorem provers (no general protocol used for

interaction), relying instead on ad-hoc coding to utilize their functions. Our system is

not completely separated from coupling to Isabelle either in that it only understands and

generates commands in the Isabelle Isar language. However, changing to use another

theorem prover should theoretically not be too dicult as most of the Isabelle-dependent

functionality is at the web-application side (the XML parsing at the client side would

require changes as well, possibly only needing to disable this transformation it as other

provers can output results in XML). However, this relies on PGIP wrappers becoming

available for other interactive theorem provers.

7.3 Limitations

The developed system has a fair share of limitations as well. Unfortunately, the system

remains in a somewhat unstable state. This is mostly attributed to the PGKit broker

and the numerous problems experienced with using it (lack of functionality, frequent

failures, errors in functionality such as undoing objects as mentioned in Section 5.2.1).

The system also experiences some user interface quirks (e.g. disappearing data when

clicking outside a window, display panels overowing past its borders if too much content

as mentioned in Section 6.4.2). Other limitations that should be addressed if time

allowed are the lack of being able to use proven lemmas in proofs and the fact that the

system's error messages are not very informative.

7.4 Criticism

There are several design decisions taken that, in retrospect, might not have been the

best solutions. These will now be discussed.

7.4.1 General Architecture

The high level architecture could be simplied. The current architecture involves relying

on a relatively long chain of dependent components at the server side. Ideally, the system

should be streamlined to reduce overhead. One possible solution is to remove the need

for the PGKit broker, rolling its functionality into the web service, thus being able to

only have one node between the web client and the Isabelle theorem prover. This would

reduce latency, potential errors and keep it easier to maintain. Loose coupling could

still be maintained if the PGIP protocol was still to be used for interacting with the

underlying interactive theorem prover, as it is not prover-specic.

88

7.4.2 PGKit Broker

The poor state of the PGKit broker resulted in a wide range of work arounds having

to be devised in order to utilize the sought after functionality. If the full information

regarding the state of the PGKit broker had been available at the start of the project

the system design currently used might not have been chosen. The fact that few PGIP

commands are implemented in the PGKit broker, and that the system is unstable, are

issues that the PGKit authors need to address if they intend the broker to be used for

new proof editors.

7.4.3 Isabelle

It was very unfortunate that the XML output functionality of Isabelle was taken out.

Furthermore, David Aspinall's explanation as to why it was taken out was quite puz-

zling. One begs to question why it was not left in the system even though it was not

much used at the time.

Having the Isabelle results marked-up in XML by Isabelle itself would have made the

system less brittle, as the web service currently has to perform the mark-up based on

textual matching.

Furthermore, it is unfortunate that Isabelle does not utilize labelling of assumptions.

This would have made the system easier to implement and less brittle in terms of being

able to refer to labels when specifying rule premises rather than entering the whole

expression. Labelling would also be useful when creating the side explanation of a proof

step result.

7.4.4 Web Client

The rendering of proof results into HTML representation is currently quite process

intensive. It was felt that the code responsible for rendering could have be improved to

process quicker and thus make the system feel more reactive.

It is regrettable that the system does not work properly with the Internet Explorer web

browser. However, it is not alone in this (c.f. ProofWeb).

Furthermore, there are usability issues with the system's user interface that could benet

from more attention (i.e. make it more intuitive, perform more error checking on user

input, provide helpful error messages as to what part of the rule application failed etc.).

7.5 Outlook on Subject

Following is an outlook of possible changes occurring in the subject eld of user interfaces

for proof editors.

89

The ProofWeb project seems to have found a niche and has secured funding to continue

being developed for another 1 12 year. The system has expanded rapidly, and the latest

version seems promising as it allows showing both Fitch-style and Gentzen-style proof

visualization of proofs. The system, according to [31], has fully replaced the need for

other proof editors such as Proof General, even for expert use.

Pandora is a tried-and-tested system that has been fully embraced in teaching at Impe-

rial College London. However, it does not seem that the system has any strong points

that distinguishes it from other similar systems. It seems unlikely that the system will

be much adopted outside of that educational institution.

The Jape system has not undergone any large changes during the last few years. As

a result,we do not expect that any signicant new contributions to the eld will come

from this.

The Proof General for Eclipse project seems to be progressing with developing of the

system that is meant to replace the Proof General Emacs user interface [3]. It is the

author's opinion that the new system does not bring much new improvements in terms

of usability and functionality, thus it is unlikely that it will change the eld signicantly

However, an interesting change within the Proof General project is the recent release

of the PGIP 3.0 protocol [5]. The protocol has been simplied substantially to remove

unnecessary information and redundant functions. The new protocol is aimed towards

being used by stateless network clients. This shift in direction addresses some of the

problems experienced during the implementation of the system. However, there is no

word of any applications that are being developed to use this new protocol revision.

7.6 Future Work

If time permitted, there are several issues that could be improved in the system. The

most important improvements to the system would be to improve the stability by mod-

ifying the PGKit broker source code to cater for the critical errors experienced, to cater

for the Internet Explorer web browser and to further improve the usability of the user

interface. Furthermore, the system would benet from allowing users to reuse proven

lemmas in proofs. In a bigger picture, it would be desirable that the system supported

the creation of proofs using rules that lie outside of natural deduction.

Putting the system aside, there are several issues within the eld of interactive theorem

prover user interfaces that warrants more work. One area of potential study is that

of streamlining proof scripting in terms of reducing the dierence between scripts in

diering systems to allow users to jump between systems without having to learn the

system from scratch. Another interesting issue is that of creating user interfaces that

allow changing the way it renders the proof based on the type of theorem proving being

performed.

90

7.7 Final Remarks

The implemented system is in a usable state, and user test evaluations gave evidence

that it is successful in making it easier to perform proofs (removing need for prover

specic knowledge, removing the need to install software locally and the ability to

visualize proofs). It managed to successfully ll a small gap in the eld of proof editor

user interfaces that no other system (to the author's best knowledge) has properly

addressed.

The author's experience of undertaking this project was overall quite positive. Although

the system was less straight forward to implement than initially foreseen, a usable system

was successfully developed that managed to address all the aims and objectives outlined

for the project. There were times during the project when it seemed unlikely that all

the objectives could be met. However, solutions were found to all these hindrances.

On a more personal note, the author feels that it was rewarding to manage to undertake

and successfully complete such a relatively large project.

91

92

Bibliography

[1] Konstantine Arkoudas. Simplifying Proofs in Fitch-Style Natural Deduction Sys-

tems. Journal of Automated Reasoning, 34(3):239294, 2005.

[2] David Aspinall. Proof General. http://proofgeneral.inf.ed.ac.uk. Accessed

13. March 2007.

[3] David Aspinall. Proof General Eclipse. http://proofgeneral.inf.ed.ac.uk/

eclipse. Accessed 13. March 2007.

[4] David Aspinall. Proof General Kit - White Paper. http://homepages.inf.ed.

ac.uk/da/papers/drafts/white.pdf, July 2003. Accessed 13. March 2007.

[5] David Aspinall and Christoph Lüth. PGIP, the Proof General Interaction Proto-

col. http://proofgeneral.inf.ed.ac.uk/wiki/Main/PGIP. Accessed 10. August

2007.

[6] David Aspinall and Christoph Lüth. Proof General meets IsaWin: Combining Text-

Based and Graphical User Interfaces. Electronic Notes in Theoretical Computer

Science, 103:326, November 2004.

[7] David Aspinall and Christoph Lüth. Commentary on PGIP. http://

proofgeneral.inf.ed.ac.uk/Kit/docs/commentary.pdf, March 2007. Accessed

10. August 2007.

[8] Sebastian Bergmann. PHPUnit. http://www.phpunit.de. Accessed 13. July 2007.

[9] Yves Bertot, Ahmed Amerkad, Pascal Lequang, Loïc Pottier, and Laurence Rideau.

Pcoq: A java-based user-interface for Coq. http://www-sop.inria.fr/lemme/

pcoq. Accessed 13. March 2007.

[10] Richard Bornat. Natural Deduction Proof and Disproof in Jape.

http://jape.comlab.ox.ac.uk:8080/jape/DOCUMENTS/CURRENT/natural_

deduction_manual.pdf, March 2005. Accessed 13. March 2007.

[11] Richard Bornat and Bernard Surin. Jape. http://www.jape.org.uk. Accessed

13. March 2007.

93

[12] Richard Bornat and Bernard Sufrin. Animating formal proof at the surface: The

jape proof calculator. Computer Journal, 42(3):177192, 1999.

[13] Krysia Broda. Pandora Help - Rules. http://www.doc.ic.ac.uk/pandora/bin/

help/Rules/rules.html. Accessed 13. March 2007.

[14] Krysia Broda, Jiefei Ma, Gabrielle Sinnadurai, and Alex Summers. Friendly e-tutor

for Natural Deduction. In Proceedings of Teaching Formal Methods: Practice and

Experience (TFM 2006), London, UK, 2006.

[15] John Brooke. SUS - A quick and dirty usability scale. Digital Equipment Corpo-

ration, Ltd., 1986.

[16] Ewen Denney, John Power, and Konstantinos Tourlas. Hiproofs: A Hierarchical

Notion of Proof Tree. Electronic Notes in Theoretical Computer Science, 155:341

359, 2006.

[17] Bob Dougherty and Alex Wade. VisCheck. http://www.vischeck.com/. Accessed

20. August 2007.

[18] Tim Down. log4javascript - a JavaScript logging framework. http://www.timdown.

co.uk/log4javascript/. Accessed 13. July 2007.

[19] Jens Dørup, Michael Schacht Hansen, Lars Riisgaard Ribe, and Kristoer Larsen.

A comparison of technologies for database-driven websites for medical education.

Medical Informatics and the Internet in Medicine, 27(4):281289, 2002.

[20] Thomas Fuchs. script.aculo.us - web 2.0 javascript. http://script.aculo.us.

Accessed 13. March 2007.

[21] Erich Gamma and Kent Beck. JUnit. http://www.junit.org. Accessed 13. July

2007.

[22] Jesse James Garrett. Ajax: A New Approach to Web Applications. http://

www.adaptivepath.com/publications/essays/archives/000385.php, February

2005. Accessed 13. March 2007.

[23] Michael J. C. Gordon, Robin Milner, and Christopher P. Wadsworth. Edinburgh

LCF, volume 78 of Lecture Notes in Computer Science. Springer, 1979.

[24] Joe Hewitt. Firebug - Web Development Evolved. http://www.getfirebug.com/.

Accessed 13. July 2007.

[25] Edward Hieatt. JSUnit. http://www.jsunit.net/. Accessed 13. July 2007.

[26] Martin Homik and Andreas Meier. Designing a Proof GUI for Non-Experts

Evaluation of an Experiment. In Proceedings of the International Workshop on

94

User Interfaces for Theorem Provers (UITP 2005), pages 160178, Edinburgh,

Scotland, 2005.

[27] Michael Huth and Mark Ryan. Logic in computer science: modelling and reasoning

about systems. Cambridge University Press, New York, NY, USA, second edition,

2004.

[28] Jacques Fleuriot. Designing and Implementing an Interactive Natural Deduction

Proof Editor for Isabelle. http://homepages.inf.ed.ac.uk/alex/msc/project.

php?number=P014. Accessed 10. August 2007.

[29] Jakob Nielsen. Heuristics for User Interface Design. http://www.useit.com/

papers/heuristic/heuristic_list.html, 2005. Accessed 10. August 2007.

[30] Cezary Kaliszyk. Web Interfaces for Proof Assistants. In Proceedings of the Inter-

national Workshop on User Interfaces for Theorem Provers (UITP 2006), Seattle,

USA, 2006.

[31] Cezary Kaliszyk, Freek Wiedijk, Maxim Hendriks, and Femke van Raamsdonk.

Teaching logic using a state-of-the-art proof assistant. In Proceedings of the Inter-

national Workshop on Proof Assistants and Types in Education (PATE'07), pages

3346, Paris, France, June 2007.

[32] Loïc Pottier. LogiCoq. http://wims.unice.fr/wims/wims.cgi?module=U3/

logic/logicoq. Accessed 13. March 2007.

[33] National Center for Supercomputing Applications at the University of Illinois.

Common Gateway Interface. http://hoohoo.ncsa.uiuc.edu/cgi/. Accessed 10.

August 2007.

[34] Tobias Nipkow and Larry Paulson. Isabelle. http://isabelle.in.tum.de/. Ac-

cessed 10. August 2007.

[35] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL A

Proof Assistant for Higher-Order Logic, volume 2283 of Lecture Notes in Computer

Science. Springer, 2002.

[36] John K. Ousterhout. Scripting: Higher-level programming for the 21st century.

IEEE Computer, 31(3):2330, 1998.

[37] Lawrence C. Paulson. Isabelle: The next 700 theorem provers. In P. Odifreddi,

editor, Logic and Computer Science, pages 361386. Academic Press, 1990.

[38] Stuart Russell and Peter Norvig. Articial Intelligence: A Modern Approach.

Prentice-Hall, Englewood Clis, NJ, second edition, 2003.

95

[39] J. Siekmann, S. M. Hess, C. Benzmüller, L. Cheikhrouhou, D. Fehrer, A. Fiedler,

M. Kohlhase, K. Konrad, E. Melis, A. Meier, and V. Sorge. LΩUI: A Distributed

Graphical User Interface for the Interactive Proof System ΩMEGA. In Proceedings

of the International Workshop on User Interfaces for Theorem Provers (UITP-98),

Eindhoven, Netherlands, 1998.

[40] Sam Stephenson. Prototype js. http://www.prototypejs.org. Accessed 13.

March 2007.

[41] The Coq Development Team. The Coq proof assistant. http://coq.inria.fr/.

Accessed 13. July 2007.

[42] The HOL 4 Development Team. HOL 4 Kananaskis 4. http://hol.sourceforge.

net/. Accessed 13. July 2007.

[43] Norbert Völker. Thoughts on Requirements and Design Issues of User Interfaces for

Proof Assistants. Electronic Notes in Theoretical Computer Science, 103:139159,

2004.

[44] W3CWeb Services Architecture Working Group. Web Services Architecture. http:

//www.w3.org/TR/ws-arch/. Accessed 13. March 2007.

[45] Daniel Winterstein, David Aspinall, and Christoph Lüth. Proof General Kit. http:

//proofgeneral.inf.ed.ac.uk/Kit. Accessed 10. August 2007.

[46] Daniel Winterstein, David Aspinall, and Christoph Lüth. Parsing, Editing, Prov-

ing: The PGIP Display Protocol. In Proceedings of the International Workshop on

User Interfaces for Theorem Provers (UITP 2005), Edinburgh, Scotland, 2005.

[47] Daniel Winterstein, David Aspinall, and Christoph Lüth. Proof General / Eclipse:

A Generic Interface for Interactive Proof. In Proceedings of the International Joint

Conference on Articial Intelligence (IJCAI 2005), pages 15871588, 2005.

[48] World Wide Web Consortium. Cascading Style Sheets Level 2 Revision 1 (CSS

2.1) Specication. http://www.w3.org/TR/CSS21/. Accessed 10. August 2007.

[49] World Wide Web Consortium. Document Object Model (DOM) Level 3 Core

Specication. http://www.w3.org/TR/DOM-Level-3-Core/. Accessed 10. August

2007.

[50] World Wide Web Consortium. Extensible Markup Language (XML) 1.0. http:

//www.w3.org/TR/xml/. Accessed 10. August 2007.

[51] World Wide Web Consortium. HTTP - Hypertext Transfer Protocol. http://www.

w3.org/Protocols/. Accessed 10. August 2007.

[52] World Wide Web Consortium. XML Path Language (XPath). http://www.w3.

org/TR/xpath. Accessed 10. August 2007.

96

[53] World Wide Web Consortium DOM interest group. Document Object Model

(DOM). http://www.w3.org/DOM/. Accessed 10. August 2007.

[54] Yahoo! Inc. Yahoo UI! Library. http://developer.yahoo.com/yui. Accessed 13.

March 2007.

97

Appendix A

XML Messages

A.1 PGIP reply

The following PGIP message is a reply from the PGKit to the web client upon the client

requesting to add the command apply (rule_tac [1] conjI) to the proof script. Notice

the amount of information being sent and redundant object list operations performed.

1 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="

pd" refid="ws" refseq="7990" seq="7991">

2 <dispobjmsg >

3 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid="a275"

objposition="a26d" objtype="UnknownType" objstate="unparseable">

4 <unparseable >apply (rule_tac [1] conjI)

5 </unparseable >

6 </newobj >

7 </dispobjmsg >

8 </pgip>


pd" seq="7993">

10 <proverstate proverid="/huh /16786/1187366719.888" provername="Isabelle

2005/ HOL" proverstate="busy"/>

11 </pgip>



13 <dispobjmsg >

14 <replaceobjs srcid="f20a" replacedfrom="a275" replacedto="a275">

15 <delobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=

"a275"/>

16 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=

"a277" objposition="a26d" objtype="ProofType" objstate="

parsed">

17 <proofstep >apply (rule_tac [1] conjI)</proofstep >

18 </newobj >

98

19 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=

"a278" objposition="a26d" objtype="comment" objstate="parsed"

>

20 <whitespace >

21 </whitespace >

22 </newobj >

23 </replaceobjs >

24 </dispobjmsg >

25 </pgip>



27 <filestatus proverid="/huh /16786/1187366719.888" srcid="f20a" newstatus=

"changed" url="file: /// home/huh/theories/test3/remove.thy" datetime="

2007 -08 -19 T00:45:49Z"/>

28 </pgip>


pd" seq="7996">

30 <proverstate proverid="/huh /16786/1187366719.888" provername="Isabelle

2005/ HOL" proverstate="ready"/>

31 </pgip>

A.2 PGIP state display vs. revised XML

The following PGIP message contains a state result returned from Isabelle (not properly

marked-up as processable XML).


2 <pgip tag="Isabelle/Isar [broker]" id="/huh /6241/1187523268.329" class="

pd" refid="broker:ubuntu/huh /6240/2007719 -113428 -Z" refseq="1032" seq

="359">

3 <proofstate proverid="/huh /6241/1187523268.329">

4 <pgml>

5 <statedisplay >proof (prove): step 1

6

7 goal (2 subgoals):

8 1. (

9 <sym name="lbrakk">\<lbrakk></sym>(

10 <atom kind="free">A</atom>

11 <sym name="and">\<and></sym>

12 <atom kind="free">B</atom>); (

13 <atom kind="free">B</atom>


15 <atom kind="free">A</atom>)

16 <sym name="rbrakk">\<rbrakk></sym>

17 <sym name="Longrightarrow">\<Longrightarrow></sym> (



99

20 <atom kind="free">B</atom>))

21 2. (

22 <sym name="lbrakk">\<lbrakk></sym>(



25 <atom kind="free">B</atom>); (



28 <atom kind="free">A</atom>)

29 <sym name="rbrakk">\<rbrakk></sym>

30 <sym name="Longrightarrow">\<Longrightarrow></sym> (



33 <atom kind="free">A</atom>))</statedisplay >

34 </pgml>

35 </proofstate >

36 </pgip>

The following message, generated by the web service upon receiving the above PGIP

message, shows the state result marked-up in processable XML.


2 <body>

3 <result id="a76">

4 <tree step="1" subgoals="2">

5 <subgoal id="1">

6 <given >

7 <bracket >


9 <symbol name="and">and</symbol >


11 </bracket >

12 </given >

13 <given >

14 <bracket >




18 </bracket >

19 </given >

20 <goal>

21 <bracket >




25 </bracket >

26 </goal>

27 </subgoal >

28 <subgoal id="2">

29 <given >

100

30 <bracket >




34 </bracket >

35 </given >

36 <given >

37 <bracket >




41 </bracket >

42 </given >

43 <goal>

44 <bracket >




48 </bracket >

49 </goal>

50 </subgoal >

51 </tree>

52 </result >

53 </body>

101

Appendix B

Natural Deduction Rules

These rule gures were all taken from Tobias Nipkow, Lawrence C. Paulson, and Markus

Wenzel. Isabelle/HOL - A Proof Assistant for Higher-Order Logic, volume 2283 of

Lecture Notes in Computer Science, Springer, 2002

∧i

→ i

∃i

∀i

= i

¬i

∨i1

102

∨ι2

classical

∧ε1

∧ε2

∀ε

∃ε

∨ε

= ε1

= ε2

¬ε

→ ε

PBC

LEM

103

Appendix C

Questionnaire

104

Questionnaire for web based, natural deduction proof editor.

The system to be tested is being developed as my MSc summer project. It is meant as a tool for creating formal proofs in Natural Deduction, and is mainly aimed towards novices and non-experts. The system is meant to address the following:

1. Make it easier to visualize proofs.

2. Remove the need for specific knowledge about the underlying interactive theorem prover.

3. Make theorem proving more available in terms of removing the need to install software locally.

The aim of this user test is to evaluate the usability and appropriateness of the system, its stability, and the design of the user interface.

The user test requires you to perform some simple Natural Deduction theorem proving using the web application, and should not take more than 10-15 minutes of your time. In order to take part in this test, you must have access to the Mozilla Firefox web browser (1.5+, 2.0+), and be comfortable with first-order logic.

The data acquired in this questionnaire will be kept anonymous. No personal details will be asked for. If you have any questions, do not hesitate to contact me:

Jonas Halvorsen

[email protected]

Part 1:1. On a scale from 1-5, how confident do you feel with logic and performing natural deduction

proofs.

2. Do you know how to use Fitch-style / Box-style notation of natural deduction proofs?

Note: if you don't know this notation style, you might want to take a quick look at http://www.danielclemente.com/logica/dn.en-node15.html for a short introduction to Fitch-style notation.

3. On a scale from 1-5, how comfortable are you with performing pen-and-paper natural deduction proofs?

4. Do you have any experience with using an interactive theorem prover? If so, list the systems that you have used.

5. Look at the following linear natural deduction proofs. Can you follow them?

[1]

http://www.danielclemente.com/logica/dn.en-node15.html

[2]

6. Now look at the proofs below. Can you follow them?

[3]

[4]

7. Which of the two proof styles (question 5 style vs. question 6 style ) did you find easiest to follow? Why?

Part 2:1. Load up the BoxProvr webpage in the web browser, and log in. The web-address and the

login name you have been allocated is written on the first page of this questionnaire.

2. You will now see the main screen without a script loaded. On the left, a menu has appeared with the following sub-headings:

'File': File actions such as save, load, new file.'View': Actions related to what and how information is displayed.'Script': Non-proof step script actions, such as creating a new theorem to prove.'Forward': Forward applied rules available.'Backward': Backward applied rules available.'Isabelle': Specific Isabelle rules available that usually don't appear in logic books.

3. Go to 'File->Open File'.a) Pick the file called demo.thy.

You will now see three proofs. Two of them are coloured green to indicate that they are closed proofs. The remaining one is grey, indicating that it is not finished.

4. Go to the 'Forward' menu, and place the mouse cursor over a button. After a short time period a box will appear with an explanation of the rule. All the buttons in the rule application menus have clue-tips.

5. Complete the unfinished proof by: a) Clicking on the empty dotted box to select the subgoal to work on.b) Then click on the assumption labeled '1'.c) Now click on the '→E' button in the 'Forward' menu. You will now see that the rule has

been applied.d) Finish the subgoal by again clicking the subgoal, and then directly click the 'Ass' button

(without selecting any assumptions). The box will now turn green and close, indicating that the proof is completed.

At any time, you can right-click on a line arrived at by a rule application, and press 'undo' in the menu that appears to undo the proof up to that step.

6. Now press the 'File->New File' button. Enter the filename: 'one'. A new script will be created, called 'one.thy'.

7. Press Script->Add Theorem to add a new theorem.a) Enter the text 'one' into the 'name' box.b) Enter A; B into the 'Assumptions' box. This will add A and B as two separate givens to

the sequent to prove (note: this box can be empty if the sequent to prove does not rely on any assumptions).

c) In the 'Goal' box, enter the character 'A', then press the '/\' button and finally enter the ' B' character (you can also enter '(' and ')' to make the proof easier to read).

d) Press 'Proceed'. The theorem will now appear as a new thing to prove in the script.

8. Select the subgoal as before. Press the 'Backward->/\I' button. Now close both the subgoals by the regular step of selecting the subgoal and applying 'Ass'.

Note: you can, at any time, view the Isabelle proof script by 'Script->View Script'.

Part 3:Now that you know the basics, try to perform 2-3 of the following proofs. Write down any problems you encounter.

1. A B → B A∧ ∧2. (A→ ( ¬B)) = (B→ ( ¬A))3. (¬ ( x. P x))→( x. (¬ (P x)))∀ ∃4. ¬ ( x. ( P x)) = x. ( ¬ ( P x))∀ ∃5. y. ( x. ( P x y)) x. ( y. ( P x y))∃ ∀ ∀ ∃6. ( x. ( A→ ( B x))) → (A→ ( x. ( B x)))∀ ∀7. ( x. ( P x)) ( x. ( Q x)) x. (( P x) ( Q x))∃ ∨ ∃ ∃ ∨

Part 4:

1. Did you manage to use the system? If not, what was the problem?

2. Were there any errors that appeared that prevented you from performing proofs?

3. Do you feel that the system aids in performing natural deduction proofs, in terms of proof representation and not needing knowledge about the prover syntax?

4. Were there any aspects that seemed to go against what you associate with box-proofs?

5. Do you feel that the system would be useful for a novice user, as a tool to perform formal proofs in natural deduction?

6. Would the system be of any use to you personally? Why?

Part 5:

System Usability Scale

© Digital Equipment Corporation, 1986. (released to public domain)

Strongly Strongly disagree agree

1. I think that I would like to use this system frequently

2. I found the system unnecessarily complex

3. I thought the system was easy to use

4. I think that I would need the support of a technical person to be able to use this system

5. I found the various functions in this system were well integrated

6. I thought there was too much inconsistency in this system

7. I would imagine that most people would learn to use this system very quickly

8. I found the system very cumbersome to use

9. I felt very confident using the system

10. I needed to learn a lot of things before I could get going with this system

[5]

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

Part 6:

1. Note two things that you liked about the system:

2. Note two things you disliked about the system.

3. Any other comments?

Thank you for your time![1] Allen, C., & Hand, M. (2001). Logic Primer. Cambridge: MIT Press. P169

[2] Allen, C., & Hand, M. (2001). Logic Primer. Cambridge: MIT Press. P119

[3] Hodkinson, Ian (2006). 140 Logic., [http://www.doc.ic.ac.uk/~imh/teaching/140_logic/140.pdf]

[4] Huth, M., & Ryan, M. (2004). Logic in Computer Science. Cambridge: Cambridge University Press. P118

[5] Brooke, J. (1986). SUS - A quick and dirty usability scale, Digital Equipment Corporation, Ltd.

http://www.doc.ic.ac.uk/~imh/teaching/140_logic/140.pdf

Documents

Web Based GUI for Natural Deduction Proofs in Isabelle