Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Web Based GUI
for Natural Deduction Proofs
in Isabelle
Jonas Halvorsen
Master of Science
Artificial Intelligence
School of Informatics
University of Edinburgh
2007
Abstract
It is fair to say that the use of interactive theorem provers is mostly limited to experts
in the field. This project attributed this mainly to the high barrier of entry associated
with using interactive theorem provers, and that most current systems do not aid the
user in visualizing proofs.
A web-based client/server system with a graphical user interface was designed and im-
plemented that users could use to perform point-and-click natural deduction theorem
proving. The system did not require client users to install software in order to perform
proofs, as the system was accessible through the use of a web browser. Proofs were
visualized in box-style notation, and proof construction done by performing point-and-
click actions on this. The sound and widely used interactive theorem prover Isabelle
was used for verifying the proofs created. The system was deemed as successful, based
on the analysis of a user test perfomed.
Acknowledgements
First, I would like to thank my project supervisor, Dr. Jacques Fleuriot, for his helpful
guidance and exceptional dedication to the project undertaken. His extraordinary en-
thusiasm drove the project forwards in difficult times, and his knowledge in the subject
field of interactive theorem proving proved invaluable.
I would also like to thank Sean Wilson for his valuable contribution to the project
in terms of comments and support. His maticulous corrections to the report were very
helpful.
Contents
1 Project Statement 10
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Description of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Expert knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Limited availability . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Background and Existing Work 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Interactive Theorem Provers . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Proof Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Proof General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Pcoq, LogiCoq and IsaWin . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Pandora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.4 Jape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.5 System Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.6 ProofWeb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Graphical Proof Representation . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Fitch-style notation . . . . . . . . . . . . . . . . . . . . . . . . . 21
2
3 Requirements 23
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Box-style Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Point-and-click Proof Creation . . . . . . . . . . . . . . . . . . . 24
3.2.3 Store proof scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Open proof scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.5 Verify proof scripts . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 User Interface Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Easy to understand . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 Hide theorem prover syntax . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Provide help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Accessibility and Performance Requirements . . . . . . . . . . . . . . . . 26
3.4.1 Provide theorem proving remotely . . . . . . . . . . . . . . . . . 26
3.4.2 Appear to work locally . . . . . . . . . . . . . . . . . . . . . . . . 27
4 System Specications and Design 28
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Technology and External Software . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 PGIP and Proof General Kit Broker . . . . . . . . . . . . . . . . 30
4.2.3 AJAX and jQuery . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.4 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.5 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Conceptual System Design . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1 System Overview of Architecture . . . . . . . . . . . . . . . . . . 35
4.3.2 Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.3 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Persistent Storage . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Design Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3
5 Implementation 40
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 PGKit Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1.1 Available version non-working . . . . . . . . . . . . . . . 41
5.2.1.2 Instability . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1.3 Few PGIP commands implemented . . . . . . . . . . . . 42
5.2.1.4 Complexity of PGIP protocol . . . . . . . . . . . . . . 43
5.2.1.5 PGIP missing a remove object command . . . . . . . . 44
5.2.2 Isabelle Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.2.1 Missing XML output feature . . . . . . . . . . . . . . . 44
5.2.2.2 PGIP communication . . . . . . . . . . . . . . . . . . . 46
5.2.3 PHP issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.3.1 Simplexml and mixed content nodes . . . . . . . . . . . 46
5.2.3.2 Security mode restrictions . . . . . . . . . . . . . . . . . 47
5.3 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.1 Creating and Displaying the proof hierarchy . . . . . . . . . . . . 47
5.3.1.1 Isabelle does not recall proof history. . . . . . . . . . . 47
5.3.1.2 Isabelle's subgoal numbering . . . . . . . . . . . . . . . 47
5.3.1.3 Repeated assumption listings . . . . . . . . . . . . . . . 48
5.3.1.4 Converting from XML to HTML . . . . . . . . . . . . . 48
5.3.1.5 Javascript timeout . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Point-and-click Proof Creation . . . . . . . . . . . . . . . . . . . 49
5.3.2.1 Available proof rules . . . . . . . . . . . . . . . . . . . . 49
5.3.2.2 Proof dependencies and reuse . . . . . . . . . . . . . . . 50
5.3.2.3 Isabelle's lack of labelling . . . . . . . . . . . . . . . . . 51
5.3.2.4 Replaying proof . . . . . . . . . . . . . . . . . . . . . . 52
5.3.2.5 Instantiation of variables in proof rules . . . . . . . . . 53
5.3.2.6 Closed subgoals . . . . . . . . . . . . . . . . . . . . . . 53
5.3.2.7 Closing subgoals . . . . . . . . . . . . . . . . . . . . . . 54
4
5.3.2.8 Undoing a proof step . . . . . . . . . . . . . . . . . . . 54
5.3.2.9 Web browser compatibility . . . . . . . . . . . . . . . . 54
5.3.3 Message Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.3.1 Initial solution . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.3.2 Revision 2: callback functions . . . . . . . . . . . . . . . 56
5.3.3.3 Revision 3: ajaxQuery plug-in . . . . . . . . . . . . . . 56
5.3.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.4.1 Expanding proofs . . . . . . . . . . . . . . . . . . . . . 57
5.3.4.2 Colour scheme . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.4.3 Help clues . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.4.4 Box hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.4.5 Menu panel vs. Toolbar . . . . . . . . . . . . . . . . . . 59
5.3.4.6 Layout of proofs . . . . . . . . . . . . . . . . . . . . . . 59
5.3.4.7 Viewable proof script . . . . . . . . . . . . . . . . . . . 60
5.3.4.8 Graphical conrmation of nished proofs . . . . . . . . 60
5.3.4.9 Show/hide proofs and proof branches . . . . . . . . . . 60
5.3.4.10 Meta-variables. . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.4.11 Drag-and-drop vs. clickable expressions. . . . . . . . . . 61
5.3.4.12 Selecting subgoal to work on . . . . . . . . . . . . . . . 62
5.3.4.13 Showing natural deduction rule names . . . . . . . . . . 62
5.3.4.14 Using mathematical symbols . . . . . . . . . . . . . . . 62
5.3.4.15 Right-click undo . . . . . . . . . . . . . . . . . . . . . . 63
5.3.4.16 Customizability . . . . . . . . . . . . . . . . . . . . . . 63
5.4 User-story Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Evaluation 78
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.1 Testing Framework and Tools . . . . . . . . . . . . . . . . . . . . 79
6.3.2 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5
6.3.3 Integration testing . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.4 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4.1 User Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4.1.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4.2 Results of User Test . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.2.1 Questionnaire part 1 . . . . . . . . . . . . . . . . . . . . 83
6.4.2.2 Questionnaire parts 2-4, 6 . . . . . . . . . . . . . . . . . 84
6.4.2.3 Questionnaire part 5 . . . . . . . . . . . . . . . . . . . . 85
6.4.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 86
7 Discussion 87
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4.2 PGKit Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4.3 Isabelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4.4 Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5 Outlook on Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.7 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 93
Appendices
A XML Messages 98
A.1 PGIP reply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.2 PGIP state display vs. revised XML . . . . . . . . . . . . . . . . . . . . 99
B Natural Deduction Rules 102
C Questionnaire 104
6
List of Figures
2.1 Proof General interface to Isabelle . . . . . . . . . . . . . . . . . . . . . 16
2.2 Pandora interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Jape interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 ProofWeb interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Box-style proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Gentzen-style proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 Use-Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 PGKit broker system architecture . . . . . . . . . . . . . . . . . . . . . 31
4.3 PGIP message exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Example Interaction Sequence . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7 Draft GUI 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 Draft GUI 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Law of Excluded Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Applying forward rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 Guide to user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Help clue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.6 Menu panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7 Proof script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.8 Selecting assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.9 Login screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7
5.10 Empty desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.11 Creating a new le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.12 Empty le created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.13 Add theorem button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.14 Theorem denition prompt . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.15 New theorem added to script . . . . . . . . . . . . . . . . . . . . . . . . 68
5.16 → i button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.17 Applying → i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.18 PBC button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.19 Applying PBC backwards the 1st time . . . . . . . . . . . . . . . . . . 70
5.20 ¬ε button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.21 Specifying new subgoals ( 1st ¬ε backwards) . . . . . . . . . . . . . . . . 72
5.22 Applying ¬ε backwards the 1st time . . . . . . . . . . . . . . . . . . . . 72
5.23 Assumption button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.24 Closing a branch with an assumption . . . . . . . . . . . . . . . . . . . . 73
5.25 Applying ∀i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.26 Applying PBC backwards the 2nd time . . . . . . . . . . . . . . . . . . 74
5.27 Specifying new subgoals ( 2nd ¬ε backwards) . . . . . . . . . . . . . . . 75
5.28 Applying ¬ε backwards the 2nd time . . . . . . . . . . . . . . . . . . . . 75
5.29 Instantiating a quantied variable . . . . . . . . . . . . . . . . . . . . . 76
5.30 Applying ∃i backwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.31 The nished proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8
List of Tables
6.1 SUS evaluation scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9
Chapter 1
Project Statement
1.1 Introduction
Automated theorem proving is a subject eld within Computer Science that aims to
automate the process of generating and verifying proofs by mechanical means. It has
numerous practical uses such as mechanizing mathematics, discovering novel proofs
(e.g. as for Euclidean geometry), verication of algorithms in hardware and software
systems, and for developing reasoning systems for intelligent agents. It can improve
our understanding of how humans learn mathematics and automate tasks that we nd
tedious [38, pp. 308-315].
However, fully automatic theorem proving is infeasible in most cases because most non-
trivial theorems are undecidable. The sheer size of the search space within most problem
domains and the processing time associated with searching for proofs are also partly
responsible for making it infeasible.
Human-guided theorem proving, also called interactive theorem proving, has however
been shown to be of use in practice. This involves combining the use of human intuition
and software automation in order to create proofs, and as a result it is often able to
reduce the search space signicantly (compared to fully automatic systems) and thus
nd proofs within realistic time [38, pp. 308-315].
Interactive theorem provers frequently rely on using symbolic logic in order to represent
theorems in mathematics or to mimic human reasoning. Novices wishing to use interac-
tive theorem provers thus have to go through the processes of learning a symbolic logic
language, such as propositional logic or predicate logic, while simultaneously learning
how to perform theorem proving within these logics [31].
Theorem proving within these logics is often done using natural deduction, a formal
reasoning model that aims to mimic human reasoning in order to make proofs easier to
perform by humans [27, pp. 6-27].
10
There is a steep learning curve associated with using interactive theorem provers since
one has to learn how the systems work and get to grips with the specics of their
underlying proof languages in addition to the symbolic logic languages mentioned above.
This is further complicated by the user having to set up an interactive theorem proving
environment.
1.2 Description of Problem
The use of interactive theorem provers is mostly limited to experts in the eld [26, 43].
This project aims to make theorem proving more widely used in terms of reducing the
high barrier of entry associated with using interactive theorem provers and to aid the
user in visualizing proofs performed using natural deduction.
1.2.1 Expert knowledge
The area of interactive theorem prover user interfaces has had limited attention and
success compared the development of the theorem provers themselves. Although there
have been several previous projects aimed at creating user interfaces for theorem provers,
they have had little impact in the eld and uptake has in general been quite low (the
Proof General generic interface application in Emacs being the exception) [26, 39, 43].
The few user interfaces that do exist rely mostly on textual representation of proofs
and it is the general consensus that they do not do a specially good job in presenting
proofs that are appropriate for human reading. As a result of lacking appropriate user
interfaces, the area of theorem proving has a high barrier of entry which limits its use
to experienced proof experts that know the prover's syntax and intricacies [26].
In order to reach a wider audience, it is necessary for the systems to be more convenient
to use and be more facilitating towards non-experts such as novice students or experts
in other academic elds that need to perform theorem proving. One solution to cater
for this is to provide eective user interfaces to the theorem provers that provide point-
and-click functionality and visualize formal proofs in an appropriate way for human
reading [39, 43, 47].
1.2.2 Limited availability
Another limit to the widespread use of theorem provers is the complexity of setting up
and running interactive theorem prover systems [30, 39].
Isabelle and many other interactive theorem provers usually only readily supports the
architecture/operating system used by its developers (often Unix or Linux), making a
barrier of entry in terms of the required platform. They can also require the presence of
11
dependency software e.g. specic compilers [34]. For example, it is quite manageable to
get Isabelle working under Linux, but more dicult to get it running under Microsoft
Windows XP. Additionally, theorem proving often involve processes that can be resource
intensive (e.g. performing pattern matching and search), increasing local hardware
requirements for the prospective user [30].
1.3 Project Aim
To achieve the goal of making it easier to for users to perform natural deduction theorem
proving, the project should address both the graphical representation and interactive
editing of proofs as well as the accessibility in terms of installing and running theorem
proving software.
The solution that we propose is to develop a client/server system that provides a web-
based interactive point-and-click graphical user interface that users can use to perform
natural deduction proofs in the higher order logic of Isabelle. It is hoped that this system
will make interactive theorem proving easier to perform, in terms of visualization and
removing the need for client-side installation, and thus make it more available for a
wider audience.
By providing the functionality of creating proofs by point-and-click interaction, the
system should remove the need for the user to learn prover specic commands. This
should especially benet novices in the eld of logic, as it is dicult to learn and
understand the specic proof language of an interactive theorem prover in addition to
symbolic logic. Additionally, the system should represent proofs in box-style notation
which is the style that most logic text books use [1], in order to make proof construction
easier to follow.
By having a client/server architecture to the theorem prover engine that users can access
using a web-based user interface, the system should address the issue of availability.
Users should be able to perform interactive theorem proving using a web browser that
communicates asynchronously with a web service (so as to act as if it is being run
locally) that interacts with the Isabelle theorem prover.
1.4 Project Objectives
The envisioned system should provide the following features in order to reach the out-
lined aims of the project:
1. It should present the user with a graphical user interface.
2. It should present formal proofs in box-style notation.
12
3. It should provide the functionality to perform point-and-click creation of formal
natural deduction proof scripts.
4. It should provide a client/service architecture that a user can access though a web
browser.
5. It should be responsive so that the user feels like the system is being run locally.
13
Chapter 2
Background and Existing Work
2.1 Introduction
This chapter introduces relevant background knowledge in the eld of proof editors for
use in interactive theorem proving.. It touches upon relevant topics for this project such
as interactive theorem provers, proof editors and graphical proof representation.
2.2 Interactive Theorem Provers
The development of interactive theorem provers is an active area of research and there
are a number of theorem provers in existence and use. Interactive theorem provers such
as Coq [41], HOL [42] and Isabelle [34, 37] are just some of the popular ones within
the eld [43]. They each have individual strengths and weaknesses making some more
appropriate in certain situations as compared to others. As each system has its own
proof script language, and that they each work in dierent ways, it dicult for users
to swap between using dierent interactive theorem provers. As a result interactive
theorem prover users tend to stick to using the system they are already familiar with.
2.2.1 Isabelle
Isabelle [37] is a general purpose theorem prover, developed by Larry Paulson and
Tobias Nipkow . It is written in the ML functional programming language. It caters
for a wide range of dierent logics, from classical to intuitionistic logic, propositional
to higher-order logic, including set theory, and is widely used and known to be a sound
system [37]. Furthermore, Isabelle has its own meta-logic (a higher-order intuitionistic
logic) which it uses to dene other logics, statements and inference rules.
Isabelle follows the Logic for Computable Functions (LCF) [23] approach , similar to
systems such as Coq and HOL. The LCF approach is based around an abstract data
14
structure that represents a theorem and only allows inference rules to work on the proof.
This design makes it relatively easy to implement in a programming language and also
makes the system easily extensible in terms of dening new rules based on the core
inference rules. Another benet of using the LCF approach is that one is allowed to
work backwards from a goal, which can make the proof easier to direct [37].
Isabelle proof scripts consists of a sequence of proof commands. The proof scripts
can either be written in raw ML (tactical style) or in the Isar language [35], Isabelle's
own scripting language. Isar aims to make proof scripts more readable for humans.
Furthermore, Isar allows one to write in dierent proof styles: declarative and procedural
[35].
2.3 Proof Editors
Current proof editors can roughly be divided into two groups; textual based general
proof editors mainly aimed at experts, and graphical based specic (i.e. natural de-
duction) proof editors aimed mainly at teaching the concept of logical proofs to novice
students. Within the latter eld there seem to be several dierent systems available
with more or less the same functionality.A common pattern within this group is that
the systems are seldomly used outside of the education institution where they were
created. A commonality is that for both groups of proof editors, adoption has been
limited.
There are some user interfaces that have gained reasonable popularity. Most notably is
that of Proof General [2].
2.3.1 Proof General
Proof General is a generic interface for proof assistants [6] that represents proof and
proof steps in textual representation within the Emacs text editor (see Figure 2.1).
The system is widely used due to its book-keeping functions and that it supports a
wide range of theorem provers. It provides to some extent a simpler interaction model
than interacting directly with the prover engine, and supports displaying script language
terms as symbols for easier comprehension (e.g. represent logical implication with →).
It does to a limited extent provide facilities for point-and-click functionality, although
there is very little emphasis on this, and it is only provided when used with the Lego
theorem prover.
15
[6]
Figure 2.1: Proof General interface to Isabelle
Proof General's main features and emphasis are on script management, script navigation
and performing basic bookkeeping. The major drawback with Proof General from the
user's perspective is that the user interface does little to represent proofs in a more
manageable way for users to understand. It still mainly relies on users inputting prover
specic commands to a text script. However, it should be mentioned that Proof General
was not designed with the novice in mind and is aimed towards power users as its web
page states [2]. There is work currently being done to reimplement Proof General's
functionality from using Emacs to using Eclipse as the underlying platform [6]. The
motivation for this is to overcome the weaknesses in the current system in terms of
lessening the cognitive burden on the user (associated with the complicated interface
of Emacs), and to improve its maintainability by separating from the close-coupling to
the Emacs Lisp API [47].
2.3.2 Pcoq, LogiCoq and IsaWin
Another proof editor that had previously acquired interest in the interactive theorem
proving community was Pcoq [9] which was an interactive proof assistant for the Coq
theorem prover. It provided the ability to do point-and-click interactive theorem prov-
ing on Coq proof scripts. However, the project has been on hiatus since 2003 and seems
to be no longer active. There also existed a version of Pcoq called LogiCoq [32], which
was accessible through the Web. Unfortunately, this project too no longer seems active.
16
The IsaWin project [6], a generic graphical user interface for the Isabelle prover with
similar functionality of Pcoq, seems to have experienced the same fate. The common-
ality between the aforementioned proof editors is that they are all intended for general
purpose theorem proving, and aimed to facilitate a wider eld than that of just natural
deduction proof of higher logic. The user is required to have more specic knowledge
in order to use the proof editor as it is not as tailored for a specic use, nor is the proof
visualization optimized for the proof type at hand. It would be benecial for users per-
forming interactive theorem proving to have the proof at hand visualized in a suitable
manner. i.e. box-proofs for natural deduction based proofs or tree/graph-structure for
inductive proofs.
2.3.3 Pandora
Pandora is a tool used for teaching natural deduction in rst-order logic (see Figure 2.2).
It is developed at Imperial College London and is used extensively in their undergraduate
introductory course to logic [14]. The system has been in existence since 1996, and the
current version of the system (version 3) runs as a Java applet within a web browser.
It provides point-and-click creation of natural deduction proofs through the use of a
graphical user interface. It renders proofs in box-style/Fitch-style notation, and has
an extensive tutorial and help library built in to guide the user in learning logic. The
system does not rely on using an established theorem prover in the background, rather
relying on its own simple proprietary proof verication engine.
[14]
Figure 2.2: Pandora interface
17
2.3.4 Jape
One system that does stand out as being dierent within the group of proof editors
with graphical user interfaces is Jape [10, 11, 12], developed by Richard Bornat and
Bernard Sufrin. It is referred to as a proof calculator rather than a teaching tool,
and was designed as an interactive proof support tool with a high-quality graphical
interface (see Figure 2.3). Jape is dierent in that it is marketed as a general tool
that can be used to create logic specic point-and-click interactive proof editors [12].
The proof rules to be available in the proof editor are specied in a simple syntax. For
example, a lecturer teaching logic can specify that only certain rules are available , and
thus restricting the students to perform proofs using only these rules.
Although Jape is geared as a tool for creating proof editors, it has so far mainly been
used for teaching logic and is used at a handful of educational institutions. As it was
not intended as a teaching tool, it does not provide any help or advice to users on how
to perform proofs, making it somewhat ill-suited for teaching logic.
It is dicult to place Jape in a clear category as to what kind of proof tool it is. It is
not a teaching tool per se,. and the authors state that it is neither a theorem prover
either, as it only reacts passively based on user action [12] . However, its authors also
note that the application can be used by experts to develop small-sized logical systems.
As a result, the system ts in between teaching tools and proper proof editors that use
interactive theorem provers.
[10]
Figure 2.3: Jape interface
2.3.5 System Coupling
The consensus is that proof editors have suered in the past as they were closely coupled
with the theorem prover [6, 39, 30, 43, 46]. The creation of the theorem prover engine
usually had priority and the creators focused seldomly on the HCI aspects of the user
18
interfaces. As a result, the user interfaces had little success. Furthermore, due to
the close coupling between theorem prover and user interface, it would be dicult for
other people to create third-party user interfaces that could interact with the theorem
prover as the designers had not envisioned the use of external user interface applications.
Current research seems to agree that theorem provers and user interfaces should have
a client/server (or distributed) architecture so that they can be loosely coupled. This
opens up the opportunity of having the theorem prover and proof editor running on
dierent machines [6, 30, 39, 43].
Notable examples of systems that use a client/server architecture are ProofWeb [30, 31],
LogiCoq [32], LΩui [39] and to some extent Proof General for Eclipse [3]. There are as
many frameworks for composite proof architecture as there are implemented systems.
ProofWeb, LΩui and the Proof General project all outline frameworks that could be
used. With the exception of the framework outlined by the Proof General project, where
the so-called Proof General Interaction Protocol is used to interact with the theorem
provers, they have all taken ad-hoc approaches without any aim to standardize the
protocols.
2.3.6 ProofWeb
ProofWeb [30, 31], like Pandora, is intended for use in teaching logic to undergradu-
ate students. It diers from both Jape and Pandora in that it is provided through a
centralized server, accessible through the use of a web browser. Furthermore it utilizes
a proper interactive theorem prover (Coq) rather than a lightweight proprietary proof
verier [30, 31]. Its aim is not just to teach logic, but also teach users how to perform
proper theorem proving (on a real theorem prover rather than a toy system) and cre-
ating proof scripts [31]. This diers from Pandora that only aims to teach the concept
of creating logical proofs. ProofWeb does not support creating proofs by point-and-click
and suers from the limitations of textual use interfaces.
19
[31]
Figure 2.4: ProofWeb interface
2.4 Graphical Proof Representation
It seems that the issue of graphical point-and-click versus textual proof script creation
is still an open question. Although is generally accepted that graphical representation
is needed in order to lessen the cognitive burden on users and lower the barrier of
entry [43, 26], expert users of interactive theorem provers can sometimes feel that such
systems hinder their work by forcing limitations upon them. This might be a reason as
to why development in the area has been slow.
A further issue regarding graphical proof representation resides in the display style.
Dierent kinds of logic proofs benet from dierent types of representation style. Thus,
there is no one style that is universally the best style for theorem proving, So far, the
style that is currently applicable to the widest range of proofs (disregarding the tradi-
tional linear representation style) is that of representing proofs as trees. A noteworthy
mention here is HiProofs, which is a specic representation of tree proofs [16]. There
is however a problem with representing proofs as trees as many proofs are cyclic and
are thus more appropriately represented as graphs. Furthermore, HiProofs are not well
suited for larger proofs as they tend to introduce signicant branching [43].
For the area of theorem proving in natural deduction, the representation styles used
can roughly be narrowed down into three types:
• Linear representation (Quine-style)
• Representation as proof trees (Gentzen-style)
20
• Representation involving proof boxes (Fitch-style)
Most of the tutorial systems mentioned in Section 2.3 utilize the box-style notation.
Box-style notation is the notation style that most logic textbooks and pen-and-paper
proofs [1]. For representing natural deduction proofs it seems to be more appropriate to
use one of the three traditional notation styles. For more general proof editors, however,
this might not be the case.
2.4.1 Fitch-style notation
Fitch-style notation, or box-style notation (see Figure 2.5) , is a proof representation
style that is much used within natural deduction theorem proving [1]. It is widely used
in logic text books due to its strengths in making proofs easier to follow and read in
comparison to traditional linear representation. The style gives structure to linear proofs
in terms of boxes, which aims to make it easier for users to keep track of how the proof
is proceeding in comparison to purely linear proofs. It also prevents the need to repeat
assumptions which can greatly clutter a proof. Furthermore, it has the advantage of
being more convenient that Gentzen-style proofs (see Figure 2.6) for representing large
proofs. This is due to its linear nature that prevents it from having the same problem
with branching as Gentzen notation has [31]. In box-style notation, boxes are drawn
around goals to indicate variable scope and to visualize the nesting of subgoals incurred
when working on a proof backwards from the goal.
[14]
Figure 2.5: Box-style proof
21
[31]
Figure 2.6: Gentzen-style proof
22
Chapter 3
Requirements
3.1 Introduction
This chapter outlines the requirements that the target system should aim to address.
The requirements were identied based on the description, aim and objective of the
project, listed in Chapter 1, as well as from the Informatics Research Proposal submis-
sion and the initial project proposal by Jacques Fleuriot [28]. These requirements were
further divided into functional requirements, user interface requirements, and accessi-
bility and performance requirements.
Each requirement identied is listed with the following information:
• An explanation of what the requirement is.
• A justication as to why the requirement was needed, and how it solves the
problem outlined in the project.
The requirements outlined should, as much as possible, be free from information regard-
ing implementation and system design as this will be the target of Chapter 4. However,
it is critical to capture the requirements correctly as they will be used when creating
the conceptual design of the software system in order to ensure that the right system is
being built (to solve the problem at hand).
3.2 Functional Requirements
The functional requirements outline the core functionality that the system should pro-
vide in order to successfully address the the outlined aims and objectives.
23
3.2.1 Box-style Notation
Explanation The system should render formal proof scripts (of natural deduction)
in box-style notation.
Justication Representing proof scripts graphically should make it easier for users
to comprehend the formal proofs that exist in a proof script. Box-style notation is a
proof notation style that is widely used in logic textbooks and when teaching logic at
the university level. As a result, it is a notation style that is widely known and the style
that most novices to logic and theorem proving are rst introduced to [1]. The style is
often used when performing pen-and-paper proofs as it gives graphical structure to the
proof being created, which makes it easier for humans to comprehend. It is expected
that users will nd it easier to understand the structure of the proof if it is represented
in this notation.
3.2.2 Point-and-click Proof Creation
Explanation The user should be able to create a formal logical proof in natural
deduction without having to know the script language of the underlying interactive
theorem prover or having to perform textual creation of the proof script. The user should
be presented with a limited set of natural deduction introduction and elimination rules,
which can be applied to the proof being worked on by point-and-click mouse gestures.
Justication This requirement should make it easier for users to perform natural
deduction proofs as it removes the need for knowledge about how the specic interactive
theorem prover works.
3.2.3 Store proof scripts.
Explanation The system should provide the user with the facility to persistently
store a copy of the proof script, currently in the buer, for later retrieval.
Justication The user might want to save an unnished proof script in order to
return and work on it at a later stage.
3.2.4 Open proof scripts.
Explanation The system should provide the user with the option to open previously
created proof scripts for further work (or review).
24
Justication The user might want to stop working on a proof script, or to continue
working on an unnished proof script that has previously been stored. Additionally,
the user might want to go back and review previously completed proof scripts.
3.2.5 Verify proof scripts
Explanation The user should be provided with the facility to run the script in a
sound and widely used/accepted interactive theorem prover, and provide the user with
the results obtained after each proof step execution (new subgoals).
Justication The user will want to run the proof script to verify that the proof is
correct so far and to see what the formal proof looks like so far. The user can then
apply natural deduction rules to unnished proofs if required.
3.3 User Interface Requirements
The user interface requirements are requirements that should address how the graphical
user interface should be designed in terms of features it should provide or general design
principles that should be adhered to. The aim of the user interface requirements is to
make sure that the designed user interface is suitable for the users in terms of usability.
3.3.1 Easy to understand
Explanation The user should be presented with a user interface that is easy to under-
stand, simple to use, and not overly complicated. The user interface should be intuitive
and thus should not require that the user memorizes how to operate the system.
Justication The system should have a low barrier of entry to use. Thus the user in-
terface should cater for non-experts. By keeping the user interface as basic and intuitive
as possible, the user is less likely to get confused.
3.3.2 Hide theorem prover syntax
Explanation The system should remove the user's need to know how to use the
underlying theorem prover in order to perform formal proofs in natural deduction, and
should hide prover-specic information from the user.
25
Justication In order to reduce the barrier of entry, the user should not be required
to know any specic knowledge of how to use the interactive theorem prover at hand.
Thus, the system should be as generic as possible in terms of mimicking pen-and-paper
box proofs. The user should be presented with generic, well known natural deduction
rules as they appear in logic textbooks and pen-and-paper proofs rather than prover
specic syntax and names to these tactics.
3.3.3 Provide help
Explanation The user interface should provide simple helpful clues and indications
to what the dierent interactable objects in the user interface do. This means explaining
what the button is used for, and in the case of the buttons for applying a proof rule,
give a short explanation what the proof rule looks like.
Justication The clues, coupled with the intuitive and simple user interface require-
ment, should make it easier for the user to use the system. It should also remove the
need for an extensive user guide as most of the guidance is provided directly to the
user in the user interface when they are likely to need it, rather than having to look up
information in a separate document (printed or electronic format).
3.4 Accessibility and Performance Requirements
Accessibility requirements refer to how easy it is to begin using the system for interactive
theorem proving,for a new user, in terms of system requirements and time needed in
order to get a system up and running (rather than how easy it actually is to use the
system to do theorem proving). Performance requirements, on the other hand, refer to
how reactive the system is from the user's perspective.
3.4.1 Provide theorem proving remotely
Explanation The system should provide users with the facilities to perform natural
deduction proofs without requiring that software be installed at the user end (other
than having access to a web browser).
Justication The system should have a low barrier of entry to use. Thus, the system
should not require users to spend time installing and conguring software in order to
start performing formal proofs.
26
3.4.2 Appear to work locally
Explanation Even though the system is accessed remotely, it should be responsive
and behave as if the application is being run on the local system.
Justication Blocking the web browser and thus forcing the user to have to wait idly
until a processing act has been completed can confuse the user and/or make the user
lose attention. Thus, it is important for users to feel that the system is responsive.
27
Chapter 4
System Specications and Design
4.1 Introduction
This chapter presents the conceptual design created in accordance to the requirements
outlined in the previous chapter. The main tasks that the user will be able to perform
when using the system (shown in Figure 4.1) are :
1. Load a proof script into the system to work on.
2. Create a new proof script to work on.
3. Apply a natural deduction introduction/elimination rules to the proof script in
focus.
4. Run the proof script at hand in an interactive theorem prover
5. Save the edited proof script to persistent storage.
Figure 4.1: Use-Case Diagram
28
In order to address the issue of accessibility and making it easy for new users to use the
system, it was decided that a web-based, client-server architecture would be a suitable
one. Following such an architecture means that the two parts can be implemented
separately, using dierent programming languages and being run on dierent platforms
as long as they can still use an agreed method of communication. The current trend is
to create systems that do not require the client to install additional software, using only
a web browser to access the functionality of the system. This trend has been apparent
for the better part of a decade, as evidenced by so called web applications [22]. The
use of this design solution would hopefully minimize the requirements set on the user
in terms of:
• Operating System Architecture - Requires only a web-browser to use, which is
available on all modern operating systems. Thus the system gives the user greater
freedom as to what operating system they can use.
• Need to install software locally - The server will do most of the underlying pro-
cessing, upon the request of the client. The client will utilize the web server's
functions through the use of the web client application run in a web browser. As
a result, there is no need for the client to install any software locally.
• File space - As the proof scripts are stored and run on the server, and there is
no need to install any software locally, the system does not place any le space
requirement on the client-side (apart from regular web-browser cache and/or op-
erating system swap le).
• Hardware requirements - As the interactive theorem prover is being run on the
server rather than at the client, the system does not put any specic requirements
on the user's system in terms of the amount of memory required and the speed of
the CPU.
There are some consequences of this choice of architecture. One is that it introduces
a requirement that the user needs to have access to a network connection in order to
communicate with the server. Another consequence of a client-server architecture is that
it introduces a time delay of interaction between the client and server which is avoided
(or relatively negligible) in other architectures. These delays can vary widely according
to the network speed of the connection between client and server. However, there are
techniques and designs that can be used in the client-server architecture in order to
minimize the time delay inicted. Additionally, the aspect of the system that is most
likely to incur the longest time delay is the processing of a proof step application in the
associated interactive theorem prover, which would not be avoided in other architectures
either.
29
4.2 Technology and External Software
The conceptual system should rely on using a variety external software and technol-
ogy in order to provide the above mentioned functionalities. These dependencies and
technologies are explained and justied in this section.
4.2.1 Isabelle
The conceptual system should rely on using the Isabelle general purpose interactive
theorem prover [34] for verifying and processing formal proofs. Isabelle was chosen as
the underlying interactive theorem prover for this project due to its widespread use
within the eld of theorem proving, and its general acceptance as a sound system.
Furthermore, the author of this project has previous experience with using the Isabelle
theorem prover and this choice would remove the need for learning the intricacies of a
new interactive theorem prover given the relative short time-span of the project. For a
description of the Isabelle system itself, see Subsection 2.2.1.
4.2.2 PGIP and Proof General Kit Broker
The conceptual system should utilize the Proof General Interaction Protocol (PGIP) [5]
in order to interact with Isabelle, rather than communicating with Isabelle natively. The
reason for this is that it is dicult to interact directly with Isabelle at the programming
level, as it was designed mainly for direct textual human interaction through a command
terminal. Another motivation for using PGIP is that it is a formally specied XML
language designed to be used for communicating with theorem provers, and was designed
not to be prover specic. Unfortunately, at the time of writing, no theorem provers other
than Isabelle has a PGIP wrapper.
The system should use the Proof General Kit1 broker [45] for parsing proof scripts
and issuing commands to Isabelle. This has the additional benet that PGIP display
commands can be used rather than only PGIP prover commands. This should provide
a more appropriate way of communicating with the theorem prover as the PGIP display
commands are intended to cater for proof editors. Figure 4.2 shows the intended use
of the PGKit broker as envisioned by its authors, Christoph Lüth and David Aspinall
[46].
1Henceforth referred to as PGKit.
30
[46]
Figure 4.2: PGKit broker system architecture
An example of how the PGKit broker can work as a middle-man, between the graphical
user interface and the interactive theorem prover, is shown in Figure 4.3. In the gure,
the PGKit broker translates PGIP display commands (sent from the GUI application)
into PGIP prover commands and sends it to the Isabelle theorem prover. Isabelle's
PGIP wrapper then converts the PGIP prover commands into native Isabelle commands.
Results from Isabelle are passed on by the broker to the GUI application without any
modication. Note that the use of the word command in this scenario refers to system
commands rather than proof script commands. Proof script commands are encapsulated
within the system commands and the contents are copied without alteration between
the translation/conversion stages (e.g. between display and prover commands).
[46]
Figure 4.3: PGIP message exchange
The PGKit broker should also aid the conceptual system with keeping track of modi-
cations done to the proof script. The PGKit broker is responsible for performing le
operations such as creating, loading and modifying proof script les. Upon the request
to load a proof script, the PGKit broker breaks down the individual lines in the proof
script into objects (which are assigned unique IDs) which it then holds in a local buer.
The broker does this to simplify the process of editing and maintaining a correct image
of the proof script at all times. Each request by the client to modify the proof script
results in an update of the PGKit buer. The changes to the script are then passed to
Isabelle for parsing to verify that the new lines in the script are syntactically correct.
31
The object(s) added or modied to the buer will now be in one of two states; parsed (if
syntactically correct) or unparseable (if syntactically incorrect). Objects in the parsed
state can then be requested to have its state changed to processed which will trigger
Isabelle to execute the proof step and return the resulting proof state. To retract the
process stage (e.g. in order to edit the object), the object's state has to be reverted to
parsed.
This kind of work on the script could not have been done client-side as the system runs
within a web-browser, and is thus not permitted to perform le operations by default,
due to security issues. Furthermore, having the web-client deal with script loading,
buering and issuing prover commands directly to the Isabelle theorem prover would
certainly slow down the system. This is due to the large increase in messages being
sent over the network to Isabelle (each display message sent results in several prover
commands), as well as making the web client heavier and the code more complicated
which could negatively aect the responsiveness of the system.
4.2.3 AJAX and jQuery
The web-based client should rely on using Javascript to perform asynchronous com-
munication with the server and to dynamically modify the Document Object Model2
[49] of the web page based upon receiving XML updates. This technique is known as
Asynchronous Javascript and XML (AJAX) and is a recent trend that has appeared
that builds upon the web application concept, utilizing web services [22].
AJAX provides a way to make web applications feel more responsive by hiding the
communication between the client and the server from the user and allowing server
requests to be sent asynchronously. Users can continue to interact with the application
while the AJAX engine deals with the interaction requests. As a result, the client's web
browser is not blocked when server requests are sent. The AJAX engine updates the
DOM model of the rendered web page directly upon receiving results, so there is no
need to re-render the whole web page [22, 30].
As there is no need to re-render the whole screen, the server only needs to return the new
data rather than a completely new web page. This reduces the network overhead, which
results in the system feeling more responsive if there is high latency on the network.
The combination of reduced network trac, the ability to send messages asynchronously
and updating the DOM directly, should make the web application feel more responsive
and mimic that of a locally run application [22, 30].
Several Javascript libraries have appeared lately that aim to simplify the process of cre-
ating AJAX enhanced web applications by providing sets of commonly used functions,
such as traversing the DOM or creating user interface functionality (e.g. drag-and-drop)
2Henceforth referred to as the DOM.
32
. There are many competing libraries such as Prototype.js [40], Script.aculo.us [20] and
YahooUI! [54], which all more or less provide the same functionality. The one chosen
for this project is a relatively new library called jQuery. Its strength lies in the way it
makes it easy to traverse and modify the DOM through the use of selector functions
to locate DOM objects based on XPath expressions [52] and/or Cascading Style Sheet
(CSS) attributes [48]. The core jQuery library itself is quite small compared to the
other aforementioned libraries, but it has a large active community contributing with a
vast library of user created plug-ins.
4.2.4 PHP
The server should be written in the PHP scripting language, and should utilize the PHP5
pre-processor for the Apache web server in order for the client to request functions to
be performed server-side through the use of the Hypertext Transfer Protocol (HTTP)
[51]. There were several reasons to why the PHP language was chosen for this project:
1. The server has to be able to accept HTTP requests and reply in XML. The PHP5
pre-processor makes it easy to provide an access point to the service, makes HTTP
parameters sent available as global variables, and makes it easy to format proper
XML output.
2. The server has to interact with a database for storage. Database access methods
are built into the PHP5 core by default, and makes this very easy to implement
code that accesses a database for data.
3. The server needs to navigate and modify XML data structures. The PHP5 core
includes the Simplexml functions, which makes it easy to traverse the XML DOM.
In comparison, Java's Saxon package for DOM traversal can be quite cumbersome
to utilize as it requires more code to perform the equivalent operation.
4. The server needs to do substantial textual matching and editing. PHP5 has built-
in support for the use of Perl Regular Expression syntax.
5. PHP is a dynamically and weakly typed language, which removes the need for
explicit casting between types. As comparison, Java is a statically and strongly
typed language which makes it less suited for glueing together the use of other
applications as interpreted scripting languages are (cf. Scripted Components de-
sign pattern) [36]. As the web service needs to perform textual matching and
transformation on the results retrieved by the Isabelle theorem prover, casting to
String type for text character manipulation will be necessary.
6. The PHP scripting language is relatively easy to learn and does not have compli-
cated syntax in comparison to languages such as Java and Haskell. Additionally,
it is well documented with a range of tutorials available in the public domain.
33
Many of the aforementioned functionalities are available in other programming/scripting
languages. The alternative solutions considered were: Java Server Pages, Python and
Perl.
Java Server Pages were ruled out as it uses Java which is not well suited for scripting
as it can require relatively more code to perform simple tasks such as le handling and
XML traversal [36]. Furthermore, it is statically and strongly typed as noted above,
which makes it less suitable for scripting.
Python shares a number of strengths with PHP. They are both dynamically typed,
interpreted languages, and have a simple syntax. Python was ruled out on the basis
of its syntax being more complex in general than PHP, and based on the author's
subjective evaluation that it is not documented as well as the PHP language.
Perl is a dynamically and weakly typed interpreted scripting language similarly to PHP.
It can be used in combination with the Common Gateway Interface (CGI) [33] to provide
HTTP access and thus be used as a web application. However, it is notoriously dicult
to debug, and has a generally poor coding style making it dicult to learn and read
[19].
Thus, based on the analysis above, PHP was selected to be used when developing the
system.
4.2.5 MySQL
The server should store data regarding users, proof scripts and broker and prover in-
stances in persistent storage. For easy access as well as catering for possible future func-
tionality expansion, the system should store this information in a relational database
management system. The database system chosen for this project is MySQL.
MySQL was chosen due to its widespread use and that it is readily supported in PHP5,
as well as the fact that it is open source. A contending option was the use of PostgreSQL
which is similar in terms of being supported in PHP5 as well as being open source. There
are no notable advantages or disadvantages in using one over the other in this project.
MySQL was chosen merely due to its popularity in web application development.
4.3 Conceptual System Design
This section presents a more in-depth description of the conceptual design of the system
outlined in Section 4.1.
34
4.3.1 System Overview of Architecture
As noted in Section 4.1, the conceptual system has a web-based, client/server architec-
ture. Thus, the system as a whole can be divided into client-side and server-side for
further study.
The server in the architecture should be implemented as a web service that provides an
access point for web clients to utilize the server-side functions. The web service should
thus be responsible for interacting with the prover (through the broker) on behalf of
the client.
The web client should be implemented as an AJAX enhanced web-page, accessible from
a web server and run within the user's web browser, that interacts with the web service
in order to provide the user with the functions outlined in Section 4.1.
The general structure of interaction between the client-side and the server-side of the
system is shown in Figures 4.4 and 4.5.
Figure 4.4: Architectural Overview
Figure 4.5: Sequence Diagram
35
Figure 4.6 shows a more concrete example of interaction between the dierent com-
ponents in the overall system. The diagram represents the process of performing an
implication introduction (impI) step on a goal.
Figure 4.6: Example Interaction Sequence
4.3.2 Web Service
The web service should deal with the interaction between the user and the broker (and
Isabelle). User actions on the web client trigger HTTP requests sent to the web service.
The web service reacts accordingly to the requests, triggering actions on the broker,
parsing its results into XML and returning the parsed results to the web client for
display. The web service should be written in PHP5 and run on a web server (e.g.
Apache).
The concept of web services is often used with varying meaning. The W3C denes a
Web service as
...a software system designed to support interoperable machine-to-machine
interaction over a network. It has an interface described in a machine-
processable format (specically WSDL). Other systems interact with the
Web service in a manner prescribed by its description using SOAP messages,
typically conveyed using HTTP with an XML serialization in conjunction
with other Web-related standards. [44]
However, the everyday use of the concept is often somewhat less strict. Is is common to
talk about using web services for AJAX requests, even though these web services do not
use SOAP [44] nor the Web Services Description Language (WSDL) [44]. For AJAX
applications, web services usually merely receive simple HTTP requests and reply with
plain, unencapsulated XML documents. The reason for this is that the use of SOAP
36
encapsulation introduces overhead and requires extra processing client-side which AJAX
designers usually try to minimize. This simplied denition of a web service is the
one that will be used in this project.
The web service in this application should be implemented as a set of PHP les that
act as access points to the underlying services they provide. These services are:
• Open script
• Create script
• Save current script
• Edit current script
• Process script
Core functions for communicating with the broker, the database and parsing broker
responses, should be delegated to separate les/classes for re-use, maintainability and
to facilitate testing.
4.3.3 Web Client
The web client application should act as the user interface in which the user can perform
theorem proving. The user utilizes the functions oered by the web service indirectly
through the web client.
The web client should be implemented as a single web page - enhanced with Javascript
code to provide AJAX functionality - that passes requests to the web service. Upon
receiving XML updates from the web service (asynchronously), the Javascript in the
web page will control how the web browser renders the results. The dynamic DOM
modication and the creation of AJAX queries by the Javascript will utilize the jQuery
Javascript library (see Subsection 4.2.3 for further explanation) in order to simplify
coding and to easily apply visual enhancements to the web page rendering.
It should be the responsibility of the Javascript in the web client to build the proof
hierarchy for rendering, as Isabelle output does not contain any hierarchical information.
Thus, it is necessary for the web client to keep track of previous results in order to build
the graphical representation. Furthermore, it needs to dynamically transform XML
results into HTML for insertion into the DOM. The Javascript also needs to interpret
user actions in order to generate HTTP requests to be sent to the web service.
37
4.3.4 Persistent Storage
The PGKit broker should deal with the actual serialization of the proof scripts. Proof
scripts should be stored as regular Isabelle proof scripts with the .thy extension.
User account details should be stored in the MySQL database. The database should
store information regarding username, home directory, broker instance ID, broker net-
work address, prover instance ID as well as information regarding the proof script at
hand. Since there is relatively little data that is to be stored in the database, it could
be replaced with storing the information in an XML le instead. However, in order
to build in extensibility for unforeseen complications and future expansion, database
access should be used.
In order to prevent the likelihood of misuse of the system, the system will not allow
for uploading of proof scripts as a le from the user. In order to accept uploading
proof scripts, the web service would have to save the le in its own lespace before
sending it to the broker, thus opening up for malicious code to be included in the
proof script which could compromise the system. This risk is reduced if the system
requires that the proof script has to be parsed by the broker before it is allowed to
be stored on the remote service, as the broker will reject non-parseable data and thus
refuse to save the script.Also, since the user is not allowed to directly manipulate the
proof script when using the web application outside of using the set actions provided
by the web application, this should further reduce the likelihood of misuse in terms of
utilizing possible security loopholes in the PGKit broker or in Isabelle itself. However,
it is possible for a user with malicious intent to bypass the web application itself and
interact directly with the PGKit broker. It was deemed that this issue lies outside of the
scope of this project, and is something that should rather be addressed by the PGKit
authors if security loopholes are found. If misuse is a high concern, the system could
be run under a virtual machine or a sandboxed environment to ensure that no harm is
done.
4.4 User Interface
The user interface should be designed in accordance to well established user interface
heuristics [29] and in accordance to the specic user interface requirements outlined in
Section 3.3.
As a brief summary, the user interface should:
• Have a simple and intuitive design.
• Maintain styles consistently.
• Provide clear and easily understandable error messages.
38
• Provide hints to the user as to how to use the available functions.
Although the user interface design was bound to change substantially during the imple-
mentation stage, a set of mock-ups of the envisioned user interface were made in order
to get a rough guide as to what the proof representation should look like. Figures 4.7
and 4.8 show the envisioned rendering of box-proofs that the system should provide.
Figure 4.7: Draft GUI 1
Figure 4.8: Draft GUI 2
Emphasis was put on mimicking pen-and-paper box proof creation, but at the same time
utilizing the advantages of being able to resize the boxes dynamically. Thus, one can
minimize subgoals in order to remove these from view when concentrating on solving a
specic subgoal.
4.5 Design Summary
The conceptual design of the system should act as a guide during the implementation
of the system. However, implementation stages often involve signicant deviations
from the conceptual design, in order to mitigate problems that appear and to more
appropriately address the aims of the project problem at hand.
39
Chapter 5
Implementation
5.1 Introduction
This chapter covers the implementation stage of the project. It outlines issues that
have arisen during the project and what design decisions were taken as to mitigate or
overcome these issues.
The implementation followed the conceptual design specied in Chapter 4 as closely
as possible. The high-level conceptual architecture was, to a great extent, adhered to.
However, there were unforeseen complications that resulted in the need to change some
of the details of the system design.
5.2 Web Service
The implementation of the server-side part of the system was more complicated than
initially foreseen.
Notably, it became clear during implementation that the PGKit broker had a number
of severe shortcomings that aected the way the system would communicate and use
its functions. These limitations were not evident beforehand as they were not explicitly
documented nor did they become clear upon reading the documentation regarding the
framework.
There were also complications with the Isabelle theorem prover itself (in terms of needing
to process and transform its output into a suitable format) and issues with the PHP5
pre-processor that aected the way the system was implemented.
In addition to the unforeseen complications due to problems with the dependencies,
there were several design decisions that had to be taken in terms of how one would
most appropriately provide the required functions as per the requirements. All these
issues will be discussed in the next subsections.
40
5.2.1 PGKit Issues
There were severe complications with using the PGKit [4, 5, 45, 46] broker for this
project. The following is a discussion of the problems experienced, and the decisions
taken to mitigate these.
5.2.1.1 Available version non-working
Both the binary version of the PGKit broker available on the PGKit website [45], and
the version in the PGKit's CVS repository, did not work at the time the project work
started. Although the broker application would start to run, it would fail to parse
the majority of valid Isabelle proof scripts given to it. Additionally, for the few proof
scripts it managed to parse, it would not allow step-by-step, incremental processing of
the script. i.e. it would only allow to parse the whole script at once.
Time was spent on trying to understand what and where the problems lay, in terms
of inspecting log les, trying dierent CVS versions of Isabelle and recompiling the
PGKit broker code from the source code without any success. The authors of the
PGKit broker, David Aspinall and Christoph Lüth, were contacted directly to enquire
about the problems experienced as no success had been achieved in trying to run the
application. The reply received from Christoph Lüth indicated that the PGKit broker's
code itself was currently in a non-working stage. However, he immediately performed
changes to the code so that the PGKit broker again was in a runnable state, and
uploaded the working binary version to the PGKit broker's website. A short extract of
Christoph Lüth's reply is shown below.
I xed it up a bit (it has suered some bit-rot over the last few months or
so).
It now works with the latest development version of Isabelle again, or at
least does not fall over quickly[...]You can parse les (both of the examples
you provided), and you can (slowly, cautiously) step through les
It was decided that xing the the PGKit problems ourselves was outside of the scope
of the project, and would likely take a fair amount of time more appropriately spent
elsewhere.
5.2.1.2 Instability
Christoph Lüth's modications to the PGKit broker's code made the application runnable.
However, it remains unstable. The problems encountered with its stability were:
41
1. The PGKit broker suers from timeout errors when creating Isabelle theorem
prover instances. As a result, the PGKit broker might have to be restarted several
times until it manages to handshake properly with the Isabelle theorem prover.
2. The PGKit broker does not ush the script buer properly. According to the
documentation, it should be able to discard the current le at any stage and start
working on another. However, in practice this does not seem to work. In order
to close a le properly and open a new one, it was necessary to change the state
of all the objects in the buer from processed to parsed before using the action
to discard an open le. If the le was discarded without ensuring that all objects
were reverted to the parsed state, it would not properly ush the object buer
upon loading a new le. This caused object-processing requests in the newly
opened le to fail as it was unable to process the rst object found in the buer.
This was due to the object being a remnant from the previous le which had not
been ushed out properly. When such cases occurred the PGKit broker would
deadlock. The only way to get out of this deadlock was to terminate and restart
the broker. In order to overcome this shortcoming, the web service has to keep
track of what le was last opened, and what object ID the rst line of the le has
in order to retract to this stage before dismissing the le.
3. The PGKit broker suers from intermittent unexplained internal errors (i.e. no
error log given nor any other details about what caused the error), which makes
the system deadlock, hereby forcing a restart.
4. The PGKit broker does not recover gracefully from invalid process requests. These
situations can be problematic as messages might get lost over the network between
the web client and the web server (or PGKit broker), resulting in the client not
having the correct updated view of the proof script. If the client requests an action
to be performed on a removed object, the PGKit broker will deadlock.
5. The PGKit broker is prone to errors when requesting editing on a range of objects,
such as when users wish to undo a set of rule applications. If new objects have
been inserted into the range after initial loading of the le, as is often the case, and
an edit request is put on the range of objects, the PGKit broker intermittently
fails to properly determine what objects are in the range. In such cases, the
PGKit broker ends up with proceeding to overwrite all objects from the start
range specied all the way down to the end of the proof script rather than the
specied stop position.
5.2.1.3 Few PGIP commands implemented
Another setback during implementation was that few of the commands outlined in the
PGIP specication document were actually implemented in the PGKit broker. The
42
PGKit documentation is quite sparse, and there does not exist a list of what function-
ality is currently oered and what is not. As a result, some actions had to be performed
in a dierent manner in order to achieve similar results to what the missing commands
were supposed to do.
One of the non-implemented PGIP commands was one which is meant to return an
updated list of the proof script objects. As it is, there is no way for a client to ask for
a list of the current proof script after receiving the list when loading the le. To work
around this problem, the web client has to maintain and update a local buer of the
script, which might get out of synchronization if messages are lost.
Yet another problem was that some of the commands that changed Isabelle system
preferences were not implemented. i.e. it was not possible to enable the PGML-symbols
output preference on Isabelle through the PGKit broker. PGML [7, 46, 5] is a very small
XML language, contained in the PGIP specication, that encloses X-Symbols with XML
tags. In order to overcome this, the local copy of the PGKit broker source code was
modied to enable this function by default and recompiled for use in this project. Note
that PGML output does not make the whole state display result into proper XML
structure as it only wraps XML tags around X-symbols. The rest of the state display
results are still textual and needs to be parsed/transformed by the web service into a
easily processable format. An example of a PGML marked-up X-Symbol is:
<symbol name="exists">\<exists ></symbol >
5.2.1.4 Complexity of PGIP protocol
The complexity of the PGIP protocol, and the sheer size of the XML data being returned
after each PGIP command, made the protocol unsuitable for use by the web client itself,
as it would require extensive in-browser processing and introduce high overhead in terms
of the amount of data having to be sent over the network.
In order to reduce the amount of processing and size of messages being sent to and
from the web client, it was decided that the web server would transform PGIP replies
from the PGKit broker into a much simplied XML language more suited for the web
application's use. An example of the XML result returned to the web client after
creating the line apply (rule_tac [1] conjI) in the proof script is:
1 <?xml version="1.0"?>
2 <body>
3 <remobj id="a275"/>
4 <object id="a277" state="parsed" position="a26d">apply (
rule_tac [1] conjI)</object >
5 <object id="a278" state="parsed" position="a26d">
6 <empty/>
43
7 </object >
8 </body>
In comparison, the original PGIP message this XML reply was generated from contained
31 XML tags (see Appendix A.1 for a printout of the PGIP message), excluding an XML
document denition element and containing body element tags. The condensed reply
created by the web service, including an XML document denition element tag and
containing body element tags, contains in total only 7 XML tag elements.
In order to reduce the data being sent, information regarding the prover, broker and
display (i.e. the web service) components were taken out as were PGIP envelope tags
as they were of no use for the web client. Additionally, information regarding internal
sequence numbers, timestamps and other superuous XML elements were also taken
stripped away in the transformation to reduce the message size.
Furthermore, as the overall aim of the simplied XML language was to reduce the size
of the XML data sent, XML namespace declarations and Document Type Denition-
s/XML Schema declarations were not used in the simplied XML protocol.
5.2.1.5 PGIP missing a remove object command
Surprisingly there is no mention in the PGIP specications of any command to remove
an object from a proof script. This is an essential function that is needed in order to
allow users to retract/remove applied proof steps. In comparison, there are commands
for adding and editing objects.
To work around this limitation, our web service has to edit the requested removed
object to contain an empty String. Although this does not physically remove the object
from the PGKit broker's buer, it will not show up in the user display nor in the script
when the broker stores the script persistently. Another solution would be to edit the
broker so as to provide this functionality. However, this would introduce a function
not declared in the PGIP protocol and would ruin the whole point of using a formally
specied protocol.
5.2.2 Isabelle Issues
There were also complications with the Isabelle theorem prover itself, in terms of having
to process and transform Isabelle output into a suitable format for program processing
rather than for human reading.
5.2.2.1 Missing XML output feature
It was initially taken for granted that the Isabelle theorem prover provided an option to
make it return state displays (proof results) marked-up in XML syntax for processing,
44
rather than the usual textual representation of state displays returned meant for human
reading rather than application processing. This was based on reading the White Paper
on the Proof General Kit [4] that showed a state display marked-up in XML, and on
browsing of the Isabelle source code where a documented output_XML function was
found.
However, upon contacting David Aspinall and Christoph Lüth about having diculty
enabling the XML output, we were informed that this functionality had been taken out
of Isabelle a while ago. As to the explanation why it had been taken out, Aspinall said
that it "...hasn't received much attention as it makes less sense in the Isar interaction
mode." . He did, however, e-mail a snippet of code from an old experimental version
that had the functionality enabled. However, after spending some time trying to get
the code to work, the attempt was abandoned. The decision to abandon the attempt
to make Isabelle output XML marked-up results was taken due to:
1. The Isabelle code has changed signicantly since the function was taken out, thus
it was not possible to merely cut-and-paste the code in.
2. The author of this project does not have any previous experience with the ML
programming language, thus it would be necessary to set aside time to learn the
basics of ML in order to implement the function .
The decision was taken that this diversion was not worthwhile spending time on, and
we opted instead to do text matching and generating XML at the web server.
An example of the generated XML mark-up of the results, by the web service, is shown
below. Appendix A.2 contains a comparison between the raw PGIP message containing
the raw Isabelle state representation vs. a corresponding XML message generated by
the web service .
1 <?xml version="1.0"?>
2 <body>
3 <result id="a7a">
4 <tree step="0" subgoals="1">
5 <subgoal id="1">
6 <given >
7 <atom kind="free">P</atom>
8 </given >
9 <given >
10 <bracket >
11 <atom kind="free">P</atom>
12 <symbol name="longrightarrow">longrightarrow </
symbol >
13 <atom kind="free">Q</atom>
14 </bracket >
15 </given >
45
16 <goal>
17 <atom kind="free">Q</atom>
18 </goal>
19 </subgoal >
20 </tree>
21 </result >
22 </body>
5.2.2.2 PGIP communication
The PGKit broker is not responsible for all the PGIP communication problems that
have been experienced. The CVS version of the Isabelle theorem prover used with the
system (dated 4. July 2007, which was used based upon the recommendation of David
Aspinall, as it was compatible with the PGKit broker), reports an error in the PGIP
wrapper code upon starting up a prover instance. As a result, the initial line in the
proof script, that of the theory le declaration, has to be processed twice. This step
must be performed at any time the system backtracks to the rst line of the script.
It has not been possible to nd a CVS version of Isabelle that is compatible with the
PGKit and at the same time does not have this error.
5.2.3 PHP issues
There were a few implementation issues relating to using the PHP pre-processor, which
are discussed below.
5.2.3.1 Simplexml and mixed content nodes
During the implementation, there were problems with how PHP's Simplexml function-
ality deals with nodes with mixed content. The XML language was initially created for
marking up text documents as mixed content [50]. However, as the Simplexml func-
tionality currently works, one can get either the text content of a node or the sub-nodes
of a node, but not retrieve both at the same time. Additionally, Simplexml would not
dump the content of the node containing mixed content for manual processing, but
would only allow dumping the whole tree from the document root, disallowing the use
for Simplexml to navigate the XML tree. As a result, stripping away XML element tags
to get the full content of a node (with mixed content) had to be done by the use of
Regular Expressions rather than simply traversing the XML-tree with PHP's Simplexml
function.
46
5.2.3.2 Security mode restrictions
The DICE system at the School of Informatics provides a web server that sta and stu-
dents can use to host web pages. The web server has the PHP pre-processor enabled for
use, thus it would seem that it would be possible to deploy the system there. However,
it became clear that the web server runs PHP5 in safe mode. Running in safe mode
causes le access and socket connection operations to be severely limited. This means
that the PHP scripts cannot access les outside the web directory directory, and more
importantly, cannot open socket connections to the PGKit broker making the PHP
scripts unable to interact with it.
It was decided that the system had to be developed and tested on a separate system
outside of the DICE system.
5.3 Web Client
The implementation of the web client imposed a fair share of challenges that had to be
properly addressed. which will now be discussed.
5.3.1 Creating and Displaying the proof hierarchy
The implementation of the functions responsible for creating and displaying proof hier-
archies involved much deliberation, as it was not straightforward to implement them.
The problems lay fundamentally with how the Isabelle system works.
5.3.1.1 Isabelle does not recall proof history.
Isabelle treats each new proof state as a new separate subgoal to prove, from the view
of the client, without any previous history of the whole proof so far. Thus there is no
hierarchical information of the proof structure available for use by the system. In order
to overcome this problem, a design decision was taken that the Javascript in the client's
web browser would build up the proof hierarchy on its own as it receives results. It does
this by keeping track of the current position in the proof script and previously rendered
objects in order to insert the new results at the appropriate place in the DOM. This
requires substantial processing by the Javascript interpreter, but is unavoidable as long
as Isabelle does not have the ability to display the full structure of a proof such as what
is available in Coq [31].
5.3.1.2 Isabelle's subgoal numbering
Isabelle's process of numbering subgoals is dicult to track as the labelled number does
not remain consistent. This makes it very dicult to keep track of the subgoals when
47
creating the proof hierarchy. This is more appropriately illustrated with an example:
Say we have two subgoals, call the rst A, the second B.. At the start,
subgoal A has the numeric label 1, and B the numeric label 2. If a backward
rule is applied to subgoal A that introduces two new subgoals (let us call
them subgoal A.1 and A.2), these are now numbered with labels 1 and 2
respectively, and the initial subgoal B now has the numeric label 3.
In order to overcome the problems associated with this, the web applications keeps an
updated count of the respective subgoal numbers when creating the proof hierarchy. It
increments subgoal numbers for all higher existing subgoals in the DOM when a new
subgoal is inserted, and likewise decrements subgoal number of all higher numbered
subgoals when a subgoal is closed.
5.3.1.3 Repeated assumption listings
In relation to the point made that Isabelle does not recall proof history, the system
instead repeats all the assumptions for each new subgoal to prove. This causes problems
when creating the proof hierarchy in the web application, as box-proof notation is
intended to remove the need for re-listing assumptions and overwhelming the user with
superuous data. It was clear that the web application should prevent the re-listing of
assumptions where the assumptions already exist in the parent states of a proof branch.
This was implemented by making the web client (the Javascript code) check the proof
hierarchy so far so as not to introduce already existing assumptions.
5.3.1.4 Converting from XML to HTML
Before being able to update the DOM with new information generated by a proof step,
the data received in the XML had to be processed and its results converted into HTML.
Although having the web service return HTML instead of XML would remove the need
to convert at the client side, it was decided that it would make more sense that XML
was sent in order to separate display information (HTML) from processing information
(XML), and to facilitate interoperability. This solution would thus allow changes to how
results are rendered to be done at the client side, and allow other client applications to
utilize the web service.
For these reasons, a client-side function was created that transforms XML directly
into HTML, maintaining attribute data (including sub-nodes and text). The function
allowed specication as to what type of HTML tags it was to generate (i.e. DIV,
SPAN, etc.). This utility function was used for dynamically loading parts of the web
service response (e.g. expressions) for the elements where client-side processing was not
necessary.
48
5.3.1.5 Javascript timeout
During the testing of the system, a problem came to attention relating to opening
medium-to-large proof scripts such as script containing 5+ theorems with somewhat
long proofs.
When one opens an existing proof script in the web client, the whole proof script is
immediately processed and rendered for view in order to facilitate proof-by-pointing.
However, if the proof script contains numerous complicated proofs, the Javascript will
take some time in calling the web server, retrieving and processing the results, before
eventually creating the proof hierarchy in the DOM. Most web browsers put a time
limit to how long a piece of Javascript code can run before it it is suspected to have
deadlocked and warns the user (usually around 10-15 seconds).
If the proof script to be processed is of substantial size, Isabelle might take a while to
process the whole script. Additionally, the Javascript code needs to render the results
which can also take a while. Thus, the Javascript for our system can run longer than
the browser's timeout limit. In such cases the user will usually have to reply to a dialog
asking if they want the Javascript execution to continue, which is a nuisance to the user.
However, this problem is not avoidable as long as the system renders the whole proof at
initial loading. Dividing the processing of the script into chunks of 5-10 objects would
not help in this situation either, as the underlying calling Javascript function still would
take the same time (or most likely longer time than the original solution).
Note that performing proof steps, undoing proof steps and adding new theorems to
prove to an already opened proof script, should not lead to this issue as none of these
steps involve re-rendering the whole script.
5.3.2 Point-and-click Proof Creation
The issues experienced with implementing the point-and-click proof creation will now
be discussed.
5.3.2.1 Available proof rules
The choice of natural deduction rules to provide access to and in what way they are
allowed to be applied (forward, backward or both) was a design decision that required
signicant attention. The set of rules chosen, and the way they are allowed to be
applied was decided based on a survey of introductory logic material [10, 13, 27]. All
the standard rules of natural deduction are present. However, there are limitation
to how the dierent rules are allowed to be applied. This was imposed in order to
not confuse novice users, as many of these rules are complicated and unnatural to use
(though still theoretically possible) in certain ways.
49
The following proof rules are available to the users of the system (see Appendix B for
denitions of these rules):
Backwards ∧ε1, ∧ε2, ∧i, ∀ε, ∃ε, ∨ε, = ε1, = ε2, ¬ε, → ε
Forwards ∧i, → i, ∃i, ∀i, = i, ¬ε, ¬i, ∨i1, ∨ι2, classical
Additional rules PBC (shown in Figure 5.1), LEM (shown in Figure 5.2 )
[35]
Figure 5.1: Proof by Contradiction
[35]
Figure 5.2: Law of Excluded Middle
Some expert users might feel that limiting the number of rules and what way the rules
are allowed to be applied can reduce some of the system's usefulness. Expert users often
rely on using more advanced rules that do not appear as basic natural deduction rules in
logic textbooks, in order to reduce the number of proof steps needed in a formal proof.
However, these rules will in most cases only confuse novice users, as well as making
it more dicult to jump between dierent interactive theorem provers at a later stage
(we cannot expect non-generic rules to be readily available in all provers). It was felt
that a trade-o had to be made in order to make it useful for complete novices without
confusing them and at the same time be of use to more experienced users.
Note that there are 2 so-called additional rules that are available for simplifying proofs;
Proof by Contradiction (PBC) and Law of Excluded Middle (LEM). These were added due
to their widespread use in natural deduction proofs.
5.3.2.2 Proof dependencies and reuse
Ideally, we would like to have the ability to use proved theorems in other proofs in
order to reduce the proof steps needed. However, due to the complication of converting
theorem declarations in Isabelle from meta-level to object level, this was omitted. If
all theorems were forced to be declared at the object level, then re-use would not be
50
problematic, as the cut_tac command in Isabelle (that enables proved theorems to be
asserted as new assumptions into a goal being proven), is already used for the LEM rule
application which is declared at the object level.
5.3.2.3 Isabelle's lack of labelling
Isabelle does not allow any labelling of expressions in the proof script. This is un-
fortunate, as the box-style notation relies on the use of labels for human readability
(see Subsection 2.4.1). It was deemed that labelling was critical in order to maintain
the benets sought by using box-proof notation, thus a solution had to be found that
provided labelling. The solution devised was that the Javascript run at the client has
to create unique labels for assumption expressions so as to make it easier for users to
follow the box proof. However, this was not deemed necessary for goal expressions, as
there would only be one single goal to prove at the dierent levels in the proof, thus
this would not inict any confusion as to what goal a step is working on.
Although this solution solved the problem of labelling expressions, it does not success-
fully address how to properly provide explanations as to what assumptions a proof rule
worked on to arrive at a new assumption (in case of forward steps) or goal (in terms
of backward steps). e.g. using¬ε forward on assumptions labelled 1 and 3 in order to
arrive at the new assumption labelled 4 would be explained as ¬ε(2, 3) in box-proof
notation. The problem is that since labelling is not used in the proof script itself, the
system has no way to link the variable instantiations in the Isabelle proof rules to the
existing assumptions. In fact, instantiating variables in Isabelle proof rules can only be
done by explicitly stating the binding rather than referring to some label. i.e. The com-
mand for applying the ¬εrule forwards on the assumptions P and ¬P in order to arrive
at False would require the command apply (frule_tac [1] P="P"and R="False"in notE).
Figure 5.3: Applying forward rule
One way to utilize proof rule explanations is to capture information as to the labels of
51
the selected assumptions in the web client when a user applies a rule (shown in Figure
5.3). This information can then be added as an explanation to what lines the proof rule
worked on. However, due to the aforementioned lack of a labelling mechanism. this
information is not incorporated into the proof itself as it is not Isabelle-relevant, and
so would be lost when re-loading the le. This would also aect the user any time an
existing proof script was opened as it would not contain proper explanations other than
the name of the rule applied .
Two solutions were identied:
1. Add Isar text comments to the proof script at the time of the rule application,
indicating the labels of the assumptions used with the rule to be extracted at a
later time.
2. Create a data-structure within the Isabelle proof script that would contain a list
of expressions with assigned labels. This data-structure would then be used to
annotate unlabelled expressions by iteration and comparing content, before being
added to the proof hierarchy at the web client.
None of the above solutions are ideal. The rst solution is the easiest to implement.
However, it does not utilize the explanations added other than displaying its text content
in the DOM. In the case where the web application's labelling system is changed, the
labels in the explanation might no longer refer to the right expressions. The benet of
this solution, however, is that it is light-weight and does not involve much processing or
introduce much overhead to the messages being sent to the server.The second solution
is heavy-handed and requires extensive processing on both the client and server side to
ensure correctness and to utilize.
Upon weighing up the strengths and weaknesses of the two solutions, it was decided
that the use of lightweight comments would be sucient, as it would take less time to
implement, faster for the system to perform and would not make any dierence in terms
of the information being expressed in the explanation for the user.
As a side-note, a theorem prover such as Coq, which supports the use of labelling, would
not have incurred this problem.
5.3.2.4 Replaying proof
Related to the issue of labelling is that of replaying proofs upon opening a theorem le.
Upon loading a le, the web clients requests the whole script to be parsed in order to
create the proof hierarchies necessary in able to perform point-and-click proofs. The
proof hierarchy is created by traversing the returned proof results line-by-line, and
reacting on them accordingly (rules as to what will be added to the DOM). Without
52
having the labels committed to the proof script itself, the client would not be able to
determine what each rule application's side explanation was in terms of assumptions
used as premises.
To overcome this, the system was implemented so that if a comment is found following
the proof step in the script, then it uses that line as explanation (similarly to what it
would do when performing point-and-click rule applications). If there is no comment
line, then it falls back to only giving the proof rule name used. This way the system
works for scripts that do not have comments as well, falling back to somewhat less
informative explanations. An example of a commented line is
apply (frule_tac [1] P="P"and R="False"in notE)(*¬E (3,1)*).
5.3.2.5 Instantiation of variables in proof rules
During the implementation, a decision had to be taken as to what extent the system
should instantiate variables in proof rules (this issue is linked to the discussion of la-
belling in the previous section). Usually, in the Isabelle community (and most other
theorem prover communities), it is considered bad style to provide more than necessary
explicit instantiations of variables. This is due to the potential eects a change in a
proof step can have on proceeding proof steps. Keeping the variable instantiation to a
minimum, the Isabelle theorem prover can instantiate these variables internally, search-
ing for variables that satisfy the rule conditions. Again, if Isabelle utilized labelling of
expressions, this would not be as much of an issue.
In our system, the decision was taken to instantiate as much as possible in terms of the
rule's assumptions. It would not be necessary to specify the goal variable of the proof
rule as using appropriate Isabelle tactics would specify what subgoal to work on (and
thus ensure that only one goal is ever in question).
Further complications to variable instantiations included the need to useλ-expressions
on the variable instantiation on the ∃εand ∀i rules in the case that the expressions con-
tained multiple quantiers. i.e. in order to apply the ∃εrule forward on the ∃y∀x. P (x y)assumption, the Isabelle command would be
apply (rule_tac [1] P="\<lambda> y . \<forall> x . (P x y)"in exE).
5.3.2.6 Closed subgoals
Isabelle does not notify users when a branch (a subgoal) of a proof is closed. The
only indication that a subgoal has been closed is if one counts the number of subgoals
in the current proof and compare it to the number of subgoals in the previous proof
state in order to check if there is a decrease in count. As a result, the functions in the
Javascript code responsible for creating the proof hierarchy has to track subgoal counts
to determine branch closures.
53
5.3.2.7 Closing subgoals
Although Isabelle allows working on subgoals in any order, it does not allow users to
readily close subgoals non-linearly (more specically, applying an assumption). It does
not have a command that enables a one-step closure of a subgoal lower down on the
stack. In order to remedy this, the system needs to introduce a lemma that mimics
closure of a proof branch that can be applied as a regular rule that allows it to be
targeted at specic subgoals. The disadvantage with this solution is that the introduced
lemma has to be added to every proof script that is to be used in the system.
Another issue regarding closing subgoals is that some rule application commands in Is-
abelle are destructive (notably erule and drule) in that they can delete assumptions that
are used as well as close a subgoal implicitly. In order to maintain consistency, the sys-
tem will only permit the use of Isabelle's backward and forward commands (exceptions
to this is the use of the assumption lemma which requires a destructive command, and
the cut_tac command used in the LEM rule application). By removing these commands,
and attaching assumption applications to the allowed commands where needed, there
should not be any cases where subgoals are closed without the user specically closing.
5.3.2.8 Undoing a proof step
The issue of undoing/removing proof steps also required consideration. The problem
at hand was answering the question as to how far in the proof script an undo function
should revert to. Due to the linear nature of proof scripts, rule applications are added
one after the other to the script, disregarding what subgoals they are applied to.
Removing a range of objects from a given position to the end of the theorem proof
is problematic in that it can remove proof steps that are not related to the subgoal
branch that the user wants to retract actions from. i.e. there are dependencies in the
underlying data-structure that do not relate to the proof representation itself.
Deleting just single steps at a time is problematic as well, as the proceeding steps in
the script will be incorrect (i.e. subgoal numbering incorrect etc).
It was determined that the best solution would be to delete from the position of the
object sought to be undone, all the way to the end line of the theorem proof. However,
this side eect might not make much sense for the user (i.e. the user might be confused
as to why removing proof steps in one branch of a proof might remove steps applied to
another unrelated branch of the proof.)
5.3.2.9 Web browser compatibility
Dealing with browser compatibility is a complicated issue. Dierent web browsers im-
plement the W3C Web standards (grouped under the W3C Document Object Model
54
Architecture heading) [53] dierently, and to a varying extent how much of the speci-
cations of the dierent standards they follow. Below is a list of 4 dierent web browsers
(all with diering web layout engines), and how well the system works with them re-
spectively.
1. Mozilla Firefox 1.5 and 2.0 - All the functionality of the system is available, and
the web page is rendered correctly as envisioned. The system was designed with
this browser in consideration.
2. Apple Safari 3.0 - Most of the functionality is available and working. The only aw
when using the web application in this browser is that the tool-tip hints that are
meant to help the user do not seem to appear. This is likely due to a documented
problem with the Safari browser in regards to how it deals with CSS overlays in
terms of z-index levels. Apart from this relatively minute issue, the rest of the
web page is functional and renders correctly.
3. Opera 9.0 - Most of the functionality is available. However, there are issues
regarding how the browser deals with the overow CSS property. As a result,
there are problems with the rendering regarding HTML tag content overowing
past its box and overlapping content in sibling tag boxes. This is especially evident
when regarding the buttons on the menu, and that of rounded corners cropping
its text content.
4. Microsoft Internet Explorer 7.0 - As it is, the system is not usable in this web
browser. There are several reasons to this, mentioned in the list below:
(a) W3C's CSS standards [48] are not strictly followed. Several CSS properties
are not supported, and relies of the use of non-standardized CSS properties
specic to the Internet Explorer web browser.
(b) Non-standard XML processing. Internet Explorer's XML parser works some-
what dierently from the other browsers' XML parsers in terms of how it
interprets text nodes and node attributes.
(c) Plug-in incompatibility. Although the jQuery Javascript library is compat-
ible with Internet Explorer, several of its plug-in libraries are not. This is
mainly due to the problems relating to CSS and XML processing, as men-
tioned above.
(d) Lack of a proper tools to debug code and navigate the DOM for inspection,
like the Firebug plug-in available for Mozilla Firefox.
Attempts were made to remedy the problems associated with the Internet Explorer web
browser. However, based on the diculty with debugging the browser's Javascript envi-
ronment and inspecting the DOM, it was decided that the time needed to be allocated
55
to this was not worth wile based on the limited time span of the project. Additionally,
problems with the Internet Explorer browser in terms of AJAX web applications seem
to occur frequently (i.e. ProofWeb does not support it properly either) [31].
5.3.3 Message Ordering
There were complications involved with controlling what order messages are sent and
processed in the web application. Ideally, we would like the message passing at the
client side to be asynchronous so as to keep the system from locking. However, the
procedural stateful communication required by the broker comes as a contrast to the
stateless asynchronicity aimed at for the web client.
A solution had to be devised that would allow ordering of messages where needed,
without forcing the whole system to block as with pure synchronous messaging.
5.3.3.1 Initial solution
The initial solution was to create recursive functions that would deconstruct a supplied
array containing AJAX call requests. Upon reaching an array containing a single ele-
ment, the AJAX call contained would be called, and the results returned to the parent
recursive call. The parent recursive call would then trigger its AJAX call, and pass the
results so far upwards.. This solution was both process intensive and cumbersome, and
degraded the responsiveness of the system.
5.3.3.2 Revision 2: callback functions
The next improvement was to use callback functions. The jQuery Javascript library
allows AJAX calls to have associated callbacks that are executed upon completing
and receiving a reply to a query. This allows nesting of AJAX calls in the callback
functions to ensure that calls were executed and processed in a certain order. However,
this solution did not work well for the situations where you make calls procedurally in
loops, as there would be no guarantee that a query executed in the rst iteration would
complete processing its callback function before the one for the second iteration.
5.3.3.3 Revision 3: ajaxQuery plug-in
Relatively late in the implementation stage (08. August 2007), the source code to a
new jQuery plug-in called ajaxQueue (written by the jQuery library's main author)
was released on the jQuery message board. This allowed for queueing up AJAX calls,
thus making sure that one AJAX call function (and callback function) would complete
before the next one in the queue was executed. The plug-in managed to enforce this
56
ordering without web browser blocking as with synchronous messaging. The queueing
could be bypassed, if required, by regular AJAX calls.
This plug-in solved several of the issues experienced with message ordering. It was
unfortunate that this plug-in did not appear until late in the project, when much time
had been spent on solving the message ordering. The web client code was changed to
utilize ajaxQueue, which resulted in reasonable improvements to processing time.
5.3.4 User Interface
The implementation of the user interface aimed to keep the design simple so as not to
confuse the users. The user interface underwent rapid and incremental changes in order
to improve its usability, based on weekly discussions with the project supervisors. A
screenshot of the system is shown in Figure 5.4 for reference in the following discussions.
Figure 5.4: Guide to user interface
5.3.4.1 Expanding proofs
Web pages naturally expand vertically rather than horizontally. Thus it is dicult to
cater for horizontal expansion as box-proofs might introduce, especially as HTML DIV
elements need to have specied widths set for most web browser to render them properly.
As the content of the DIV grows, the DIV itself will not normally expand to cater for
this in width as the width is already set. It is possible to grow a DIV element with
the overow CSS property set to auto, which will override the set width if necessary to
cater for growing content. However, the Mozilla Firefox web browser does not deal with
this CSS property gracefully, inicting a 2-3 second browser refresh delay each time a
DIV width needs to be dynamically changed due to overow. It is a noticeable lag that
degrades the feeling of responsiveness.
57
As a result, the amount the box-proof can expand horizontally is limited. In the case of
a large number of nested boxes in the proof, the web client has a function that allows
the user to switch rendering styles so that sibling boxes are introduced vertically rather
than horizontally.
5.3.4.2 Colour scheme
The colour scheme underwent several changes. The initial test versions of the web client
demonstrated used somewhat dark colours, which made the system dicult to focus on.
As a result of the discussions, the colour scheme was revised to use lighter shades and
complementary colours.
Furthermore, the user interface was tested for appropriateness for users suering from
colour blindness through the use of Vischeck [17], a vision simulator tool. The system
was checked against deuteranope, protanope and tritanope vision, which are the most
commonly occurring colour deciencies. The result of the test was that the system was
deemed to be fully usable for users suering from these colour deciencies.
5.3.4.3 Help clues
The system provides hints to the user as to what the buttons in the user interface do.
In the case of objects that apply proof steps, the underlying natural deduction rule is
displayed in a small box next to the button . Some further hints are given about how
to apply the rule in the system, as indicated by the note shown in Figure 5.5. This
should prevent the need for users to refer to a separate user guide in order to understand
the rule's use.
Figure 5.5: Help clue
5.3.4.4 Box hierarchy
The user interface utilizes drop-shadows on proof boxes so as to give the user a sense of
depth/height visualization in order to emphasize the proof hierarchy. This should aid
the user in getting a feeling for where they are in a proof.
58
5.3.4.5 Menu panel vs. Toolbar
Two dierent layout-styles for the buttons were considered. One was to group clickable
actions into drop-down menus allocated in a horizontal toolbar on top of the screen.
The other one was to group actions into toggleable sub-panels in a movable accordion
menu (menu that expands and contracts upon opening and closing menu headings).
The solution decided upon was to use the accordion menu, shown in Figure 5.6. This
was to mimick the layout used in systems such as Adobe Photoshop, where tool buttons
are visible and easily accessible Additionally, this menu can be moved around on the
screen to suit the preference of the user (by default placed on the left hand side of the
screen). The Photoshop-like menu was further enhanced by providing the ability to
hide away groups of actions that are not currently used (toggleable panels). It was felt
that this design would go better with the layout of the rest of the user interface, and
would be quicker to use than a horizontal toolbars (panels of similar buttons kept open
instead of having to navigate the nested structure each time).
Figure 5.6: Menu panel
5.3.4.6 Layout of proofs
Two dierent layout-styles of the proof-script rendering were considered. One approach
was to separate each theorem proof in the script into separate internal page tabs .
Another solution was to render all proofs in a scrollable window. It was felt that
dividing it into tabs would make the system overly complicated for the user (e.g. more
interface objects to learn and would involving more clicks), and would involve more
Javascript processing client-side which might reduce the responsiveness of the system.
Thus, the decision was to lay out theorem proofs as a scrollable script.
59
5.3.4.7 Viewable proof script
It was decided that it should be possible to view the underlying Isabelle proof script,
if the user so wished. This would benecial for a user that wished to gradually learn
Isabelle proof commands. However, the script is by default hidden from the user, and
the user does not require to have any knowledge of Isabelle to use the system. An
example rendering of the proof script is shown in Figure 5.7.
The box-proof is shown on the right, and the corresponding Isabelle script on the left.
Figure 5.7: Proof script
5.3.4.8 Graphical conrmation of nished proofs
Users should be given conrmation that a proof has been nished. However, this should
be done in an unobtrusive way. The solution taken was to colour completed proofs with
the colour green (as opposed to the grey colour of proofs when not completed). This
was meant to utilize the common human linkage of the colour green to indicate OK
or that something is correct/allowed (e.g. in trac lights).
5.3.4.9 Show/hide proofs and proof branches
The ability to maximize and minimize proof and proof branch boxes should make it
easier to hide away information that is not currently relevant for the user, thus reducing
the information overload that might be inicted. Examples are minimizing completed
proofs and proof branches not currently worked on.
60
Additionally, the ability to maximize and minimize was extended to be used on user
interface dialogs as well, such as the main menu panel and the dialog for viewing the
actual Isabelle proof script text.
5.3.4.10 Meta-variables.
Initial implementations included Isabelle variables quantied at the meta-level in the
rendering of the proof scripts (e.g. when applying ∃ε). However, it was felt that this
was not necessary and would only confuse the users as they are Isabelle specic. Thus,
their display was taken out of the system.
5.3.4.11 Drag-and-drop vs. clickable expressions.
Applying forward proof rules often involve selecting premises (assumptions) that have
appeared in the proof so far. Our initial solution for applying proof rules was to:
1. Make the user press the button of the rule to be applied, resulting in a pop-up
box appearing.
2. The user would click and drag the premises to be used as rule parameters and
drop them in elds appearing in the pop-up box
3. Finally the user would click a button to apply the rule.
However, upon discussion, it was decided that this process was too tedious. The solution
decided upon and implemented instead was to:
1. Make the user click on the premises to be used as rule parameters. The selected
premises would be marked in red (shown in Figure 5.8) with a number appearing
to indicate the order they were selected (i.e. if the second assumption were to be
selected next, it would be labelled 2 in a red box).
2. The user would click the button of the sought rule to apply it.
This solution would reduce the number of clicks necessary in order to apply proof rules,
and also reduce the possibility of misunderstanding how to use the system.
61
Figure 5.8: Selecting assumption
5.3.4.12 Selecting subgoal to work on
In order to select a subgoal to work on, the user has to click the empty area within a
proof box. Earlier versions of the implementation involved only having to hover over
a box in order to select it. However, this was problematic when wishing to apply a
proof rule as navigating to the menu could involve temporarily hovering over another
subgoal, thus inadvertently selecting it. Forcing the user to click the subgoal would not
have this problem.
5.3.4.13 Showing natural deduction rule names
In order to keep in style with hiding away prover syntax, the side explanations of how a
statement was arrived at should use the general rule names of natural deduction rather
than Isabelle rule names. This means that the system has to do transformation on the
generated explanations before inserting it in the DOM for display.
5.3.4.14 Using mathematical symbols
When rendering proof results received from Isabelle (after rst transforming from XML
to HTML) the system should express the X-Symbol instructions with proper logical
characters rather than the text given by the prover (e.g. <span class="symbol"type="
forall">forall</span> should be displayed with the character ∀ in the client's browser).In order to accomplish this, the Javascript code has to replace the content of all symbol
tags with respective HTML character codes for display.
The system also relies on being able to convert character symbols directly into Isabelle's
Isar representation. This is needed when users wish to declare new theorems to prove.
Again, keeping with the style of hiding away prover syntax, the user should be required
to deal with proper logical symbols when entering the theorem expression. In order
62
to simplify entering these symbols, the user should be presented with clickable buttons
that add these symbols to the text area used for declaring a theorem.
5.3.4.15 Right-click undo
The user interface should allow users to undo steps in the proof (for both completed and
uncompleted proofs) by right-clicking on the proof step. A context menu should appear
with the choice to undo this step. In order to not confuse the user, right-click should
be disabled for the rest of the user interface to prevent the default browser right-click
menu to appear.
5.3.4.16 Customizability
The user interface should be dynamic in terms of users being able to customize the
layout by moving, hiding, and toggling dierent dialogs, panels and boxes. This allows
users to arrange the layout of the interface to their own requirements, if they so wish
(i.e. drawing proofs vertically and showing the underlying Isabelle proof script)..
5.4 User-story Walkthrough
This sections runs through a user-story for performing a proof in rst-order logic. The
theorem to be proven by the user is (¬(∀x. P x)) → (∃x. (¬(P x))). The theorem was
chosen for demonstration as it is non-trivial theorem to prove using only natural deduc-
tion rules, thus a good example to show how the system can be of use.
The rst thing that the user sees when accessing the web application, is a login prompt
(see Figure 5.9). The user types in their allocated username, and presses the login button
(note that the system does not prompt for a password, as the creation of a proper user
account system with password authentication lay outside the project's scope).
63
Figure 5.9: Login screen
The user is now presented with an empty desktop (see Figure 5.10). To create a new
proof script, the user presses the new le button in the menu on the left.
Figure 5.10: Empty desktop
64
The user is now presented with a window that asks for a name for the le (see Figure
5.11). The user enters example as the name of the le and clicks the create button.
The .thy le extension will be automatically added to the name if the user omits it.
Figure 5.11: Creating a new le
The user is returned to the desktop. A heading will have appeared with the name of
the le (see Figure 5.12). So far the proof panel is empty, as the le does not contain
any theorems yet.
65
Figure 5.12: Empty le created
The user is now able to add a theorem by navigating to the script heading in the menu
and clicking the add theorem button (see Figure 5.13). A window appears where the
user can enter the theorem to prove and give the theorem a name (see Figure 5.14).
The user enters the theorem's assumptions (if any) in the Assumptions text box and
the goal to prove in the Goal text box. In order to simplify the process of entering
logical symbols, the user can press the corresponding buttons. The logical symbol will
then be inserted into the text box in focus.
66
Figure 5.13: Add theorem button
Figure 5.14: Theorem denition prompt
The result of the user creating the theorem is shown in Figure 5.15. A proof box has
appeared in the blank space to the right. This shows the proof that the user aims to
67
prove.
Figure 5.15: New theorem added to script
The user now selects the proof to work on by clicking in the empty area in the middle
of the proof box. The area turns red to indicate that it has been selected. The user
now applies the → i rule backwards by selecting it in the backwards panel of the main
menu (see Figure 5.16). The resulting state of the proof is shown in Figure 5.17.
68
Figure 5.16: → i button
Figure 5.17: Applying → i backwards
The user continues the proof by selecting the subgoal, navigating to the Isabelle panel,
and applying the PBC rule (see Figure 5.18). The PBC rule is under the Isabelle panel
69
as it is an additional rule (not a basic natural deduction rule). The results are shown
in Figure 5.19.
Figure 5.18: PBC button
Figure 5.19: Applying PBC backwards the 1st time
70
The next step for the user is to apply the ¬ε rule backwards (see Figure 5.20). The
system prompts the user for the expression that is to be contradicted. This is a tricky
step, and involves the user making a decision as to what to choose as the new subgoals
to be proven. Based on the user input the system introduces two new subgoals; the
input expression and the negated version of it (see Figures 5.21 and 5.22).
Figure 5.20: ¬ε button
71
Figure 5.21: Specifying new subgoals ( 1st ¬ε backwards)
Figure 5.22: Applying ¬ε backwards the 1st time
The rst subgoal is easily completed by applying an assumption to it (see Figures 5.23
and 5.25).
72
Figure 5.23: Assumption button
Figure 5.24: Closing a branch with an assumption
Next, the user applies the ∀i rule backwards (see Figure 5.25).
73
Figure 5.25: Applying ∀i backwards
Another application of the PBC rule is needed (Figure 5.26).
Figure 5.26: Applying PBC backwards the 2nd time
The user applies another ¬ε step (see Figures 5.27 and 5.28).
74
Figure 5.27: Specifying new subgoals ( 2nd ¬ε backwards)
Figure 5.28: Applying ¬ε backwards the 2nd time
The rst unclosed subgoal is closed by the application of an assumption. The user then
applies the ∃i rule backwards on the second subgoal (see Figures 5.29 and 5.30). The
75
user is prompted with a request to instantiate the quantied variable, which the user
sets to x. The new subgoal is easily closed by an assumption application. The proof is
now complete, as indicated by the change in colour (see Figure 5.31).
Figure 5.29: Instantiating a quantied variable
76
Figure 5.30: Applying ∃i backwards
Figure 5.31: The nished proof
77
Chapter 6
Evaluation
6.1 Introduction
This chapter covers the evaluation of the system that has been developed. The evalua-
tion should determine if the system has been built correctly (verication) and that the
right system has been built (validation). Verication was done by software testing and
validation by a user test.
6.2 Test Data
The system was tested on two sets of example theories: the development set and evalu-
ation set. The development set was used during the implementation of the system and
the evaluation set used to test the nal implemented system. Each set involved theories
from a varied spectrum of diculty ranging from simple propositional logic proofs to
challenging rst-order predicate logic proofs. The development set should be represen-
tative of the test set, but should not contain the same theories. It is important that
the development and the evaluation sets are dierent so as to be able to verify that the
system should work for most natural deduction proofs, rather than being specically
catered to work for the proofs in the development set.
6.3 Verication
Verication involves checking that the system works correctly (i.e. does not have errors).
The testing methods used for this project were black-box testing and glass-box testing
(also known as white-box testing).
Verication of the system involved the following tests where appropriate:
• Unit testing
78
• Integration testing
• System testing
It is dicult to automate the testing of the user interface itself. One issue is that user
interfaces have a tendency to undergo frequent and extensive changes. As a result the
test cases needs to be updated after each change to follow the new behaviour, which
is tedious to perform. Another issue that is especially true for this project is that it is
dicult to ensure that the underlying system is at the appropriate state for what the
test case expects. It was decided that the user interface would not be tested using an
automated test suit. Rather, it would be veried by performing user-stories based on
expected usage in order to catch discrepancies from expected results.
Testing was performed during each iteration of the implementation in order to catch
potential bugs at an early stage.
6.3.1 Testing Framework and Tools
Testing of the PHP web service initially involved utilizing PHP's built-in logging func-
tionality in order to monitor results during processing. However, it was decided that
since the web service would involve several steps of parsing and transformation of data,
it would be advantageous to utilize an automated testing framework to verify that
output generated from each step was correct. Phpunit [8], a PHP testing framework
similar to the well known JUnit [21] , was utilized. Test cases were created that were
run against the code to determine if the transformations worked correctly. In the cases
where function outputs were incorrect, the testing framework agged a warning and
an explanation as to what the discrepancy between expected and actual results were.
Thus, it was easy to identify what had failed, change the code to x the error and
re-test to verify that the new changes had solved the problem. However, the actual
communication with the PGKit broker was not tested with the framework. This was
due to the PGKit broker being relatively unstable and dicult to restart, thus dicult
to maintain consistency in automated tests. i.e. a test might suddenly fail even though
no system code had been changed. It was determined that time would be better spent
on other parts of the project than automating the communication testing.
It was decided that a testing framework would not be used for development of the
Javascript/AJAX client code. Currently, automatic testing frameworks are not widely
used in Javascript web application development, other than for testing Javascript func-
tion libraries. JSUnit [25] is one available automated testing framework available for
Javascript code. The problem with using testing frameworks for AJAX web pages is
that it involves a lot of work creating unit tests in order to mimic event triggering and
DOM modication. Additionally, it is not suitable for testing systems that rely on us-
ing asynchronous message parsing as testing this would involve holding back the testing
79
framework until messages were returned and callback functions were nished processing.
The loss of a network message could thus ag a false positive error. It was thus felt
that creating complicated test cases would be misguided focus of time as the interface
changed rapidly from day to day in terms of functionality and how the underlying code
worked.
The web application was debugged and tested using the log4javascript logging library
[18], and the Firebug [24] web development package, the latter contains a collection of
tools practical for web application development. Firebug provides methods to inspect
the DOM, monitor live Javascript processing and keep track of incoming and outgo-
ing messages. Firebug also makes it possible to monitor network activity in order to
determine response time.
6.3.2 Unit Testing
Unit testing involves testing system modules and components separately from the whole
system. Testing at the unit stage makes it easier to identify where errors occur com-
pared to only testing at the higher level of the system. Unit testing was done during
the implementation of the system, and involved both glass-box and black-box testing
methods. Furthermore, for the web service, running unit tests were in several cases
automated by the phpunit testing framework.
Unit testing was to a lesser extent performed on the web client due to the close coupling
of the function code to the user interface (i.e. DOM structures had to exist in order
for certain functions to be tested properly). However, where appropriate and allowable,
unit testing was performed.
6.3.3 Integration testing
Integration testing involves testing the integration between dierent components of the
system, and how they interact. Testing of this aspect of the system relied on using the
black-box testing method.
Integration testing was performed by intercepting and monitoring messages sent between
the components of the system, and inspecting generated log les.
6.3.4 System Testing
System testing involves testing the system as a whole to determine how well the sys-
tem matches the specications. System testing relied on using the black-box testing
method. When deviations from expected results were found, the errors were tracked
down and the problematic code modied to remove the discrepancy between expected
80
and actual results. The system was then re-tested to verify that the problem was prop-
erly addressed. The system went through several of these iterations until the system
was deemed to be in a satisfactory state to be tested on users.
6.3.5 Results
In general, the system reached an acceptable level to be used for user testing. It passed
the automated tests done on the PHP code, as well as the manual tests performed.
However, due to the PGKit brokers problems with undoing object ranges and intermit-
tent lock-ups, the system was somewhat unstable (see Subsection 5.2.1). These were
problems that lay outside of the developed system and thus dicult to mitigate or
overcome.
Early in the user test (to be discussed in Section 6.4.1) it became clear that there was an
error with adding new theorems to an open script. The problem identied was that the
system did not properly capture the error that occurred when users entered an invalid
formula (i.e. not WFF). The Javascript function responsible for adding new objects to
the proof script checks web server responses for errors before processing results.
However, in the case mentioned above, the server-side system was not returning proper
failure messages. i.e. error message not being marked up with
<failstep></failstep> tags. Although the process of adding proof commands to scripts
utilize the same functions (both client-side and server-side) as the process of adding
theorems, the latter process does not experience this problem (returns proper error
messages upon invalid syntax or invalid proof step). To overcome this, the client code
responsible for adding a theorem had to stop using the generic AJAX function for
sending proof script editing requests. The AJAX calls were now coded into the function
for creating new theorems as error checking had to be done dierently for this specic
activity.
The system's responsiveness was somewhat dicult to evaluate in terms of getting
meaningful quantitative data. The complicating factor was that since the system could
not be deployed on the DICE system, it had to be run on a system outside of the
University of Edinburgh. Unfortunately, the system it was running on had limited
internal memory and connected to a relatively slow Internet connection. As the system
is highly dependent on the hardware environment and network speed, the hardware
system the implemented system was run on was far from ideal. However, the system
was tested with 4 simultaneous users accessing it through the Internet, with acceptable
results in terms of system responsiveness. Testing the response time on the local network
(rather than over the Internet connection as this was known to be a limitation) gave
response times ranging from 60ms to 1500ms, depending on the size of the proof script
as most of the time delay inicted is due to Isabelle's processing.
81
6.4 Validation
Validation involves checking that the right system has been built and that it satises the
need of the user. The validation process seeked to determine how well the implemented
system addresses user needs and that it reached an acceptable level of usability. In
order to determine this, a user test was performed.
6.4.1 User Test
The user test involved testing the implemented system on a small sample of users
utilizing the system to perform interactive theorem proving. This would make it possible
to evaluate the appropriateness of creating proof scripts using a graphical user interface
with point-and-click functionality. Although the assessment was bound to be relatively
subjective, it should give fair indication to the appropriateness of the implemented
system.
Due to the specialized nature of the study are, it was required that test subjects have
at least some knowledge about symbolic logic and creating logical proofs. This limited
who could participate in the study, and resulted in the group of test subjects being quite
small. Ideally, the system should have been tested on novice students learning logic.
However, as the project took place during the summer months, the supply of ideal test
candidates was limited. Undergraduate informatics students would have been good test
candidates, but they were not present at the university during this time period. As a
result, the system was tested on PhD students and academic sta that have experience
with the use of logic and theorem provers. The test candidates were all expert users
of theorem provers and so were somewhat out of the scope of the intended audience.
However, it was determined that these test candidates were the most appropriate to test
the system on given the circumstances, as it would not be possible to test the system
on users without knowledge of symbolic logic.
6.4.1.1 Questionnaire
A questionnaire was developed to capture the participant's evaluation of the system's
usability, appropriateness, user interface and stability. The questionnaire was divided
into 6 parts.
Part 1 aimed at assessing the experience of the test participant in terms of their expe-
rience with logic and theorem proving. More importantly, it also captured information
about what style of natural deduction notation they found easiest to follow in terms of
purely linear proof versus box-style proof.
Part 2 contained a one-page user introduction to the system (see Appendix C). This
introduced users to the system through a pre-created demonstration le containing an
82
unnished proof. By following the guide, the user learned how to open a le, apply a
proof rule, create a new le and to add a new theorem.
Part 3 listed 7 theorems, ranging from propositional logic to rst-order predicate logic,
and asked the user to perform a proof of 2 or 3 of these using the system.
Part 4 aimed to capture their experiences with performing the proof in terms of potential
errors in the system, the presentation of box-proofs and their feelings about if this system
is any help when performing natural deduction proofs. It also captured information
about if the system would be of any use to themselves.
Part 5 aimed to evaluate the user interface itself. The user interface evaluation relied
on the participant's subjective evaluation rather than measuring user performance, as
it was felt that measuring user performance (i.e. tracking time between user actions,
counting clicks, etc.) would not contribute much to the evaluation of this system. Also,
due to the nature of the system, a cross-comparison study between the use of textual
proof assistants (such as Proof General), and the implemented system was not performed
as it would be dicult to gather meaningful quantitative data from such a study as they
are aimed towards two dierent types of users. Additionally, a cross-comparison study
is time consuming to test, and it is dicult to draw concrete results from).
In order to be able to quantify the subjective evaluation of the user interface in order
to indicate appropriateness, the System Usability Scale (SUS) was used [15]. The SUS
uses a pre-set questionnaire (see Appendix C) containing 10 Likert scale questions
regarding how the participant experienced using the system under consideration. The
score calculation gives a value between 0 and 100 (where 0 is low usability and 100 is
high usability), meant to be used as a relative usability estimate.
Part 6 of the questionnaire aimed at capturing any additional comments and feedback
about the system that the user might wish to contribute with.
6.4.2 Results of User Test
The user test initially involved 6 participants. However, one participant never returned
the questionnaire and was disregarded from the analysis of the study. Additionally, one
questionnaire lacked answers to part 5 (the SUS evaluation).
6.4.2.1 Questionnaire part 1
The general information extracted about the test subjects were that they were rea-
sonable to highly condent with symbolic logic and natural deduction, as well as with
performing pen-and-paper proofs. 60% of the participants knew how to use box-proof
style notation, and 80% had experience with using the Isabelle theorem prover. The
relatively high number of participants that knew box-style notation would be bene-
cial for evaluating the system. However, it is slightly unfortunate that so many of the
83
participants had previous experience with Isabelle, making it dicult to evaluate how
much knowledge one needs of Isabelle (which we would hope to be none).
6.4.2.2 Questionnaire parts 2-4, 6
These were grouped together, as parts 2 and 3 do not ask questions, and questions 4
and 6 received overlapping replies.
All the respondents managed to use the system to perform theorem proving. Addition-
ally, all participants felt that the system made it easier to perform theorem proving
in natural deduction in terms of representing proofs and not needing prover specic
knowledge. Also, there were no issues agged as to how box-proofs were displayed
(i.e. if there were any discrepancies from what they associate with box-proofs). All the
participants felt that the system would be useful for novices.
The benecial features reported by the users were (paraphrased and duplicates re-
moved):
• box-proofs gives a good overview of where you are in the proof and how [the
proof] has progressed
• Easy to use
• No need to know prover syntax
• Helpful hints
• Easy to enter new theorems
• Only web browser needed, no install or compilation required
• Makes it easy to apply rules
• Less confusion due to limit on available rules
• No need for instructions
• Clear and easy to use user interface, much improvement to Proof General
• Point-and-click proofs easier to perform
There were also some negative aspects of the system. Unfortunately, all of the users
experienced problems with undoing steps (see problem noted with the PGKit broker
5.2.1). Users experienced problems with intermittent system crashes (found to be due to
PGKit broker deadlocks) and problems with entering new theorems which also resulted
in corrupting the script le if saved (was corrected early in the user test, noted in 6.3.5).
Other limitations identied by the users were:
84
• Failure messages were not very informative
• Would like to be able to reuse lemmas as number of rules available limit its
usefulness
• Would like shortcuts to functions rather than having to point-and-click each time
• Latency reduced responsiveness
Furthermore, all the participants noted that the system would probably not be of use
for them as they were at a PhD / academic lecturer level, and thus rely on using proof
rules that lie outside of the scope of natural deduction (e.g. induction). They would
usually not prove such non-trivial proofs interactively but rather use automated tactics.
This was expected as the system was intended to users in the novice to mid-experienced
range rather than experts.
6.4.2.3 Questionnaire part 5
SUS scores were determined using the following calculation [15] (paraphrased from the
SUS chapter):
1. Subtract the value 1 from the scale position of all odd-numbered questions.
2. Subtract the scale position value from the value 5 for all even-numbered questions
3. Sum the scores and multiply by 2.5 to get the usability estimate.
The individual results of the questionnaires are shown in Table 6.1.
Participant ID Calculated SUS score
1 85/100
2 75/100
3 82.5/100
4 55/100
Table 6.1: SUS evaluation scores
The average SUS score calculated was 74.3, which is a fairly high usability estimate.
This estimate gives strong indication that the users found the system to be of potential
use.
85
6.4.3 Summary of Results
Although the user evaluation brought into light problems with the stability of the sys-
tem, all the participants managed to use the system to perform proofs. The high av-
erage usability estimate, the questionnaire feedback and the positive feedback received
are taken as indication that the system is of practical use. The questionnaire indicated
that the system does help in visualizing and performing natural deduction proofs and
manages to address the problems outlined in the project description. However, there
are a range of improvements that are recommended to be done (e.g. stability, additional
functionality and more informative error messages).
86
Chapter 7
Discussion
7.1 Introduction
This chapter summarizes the project, including its achievements and limitations, and
puts it in a wider context for review.
7.2 Achievements
The project was successful in achieving its outlined aims and objectives. The system
successfully provides a graphical user interface that allows users to perform point-and-
click creation of proof scripts and visualizes the proofs in a graphical style that is easy
to follow. The system utilizes a well known and sound interactive theorem. In addition,
the system utilizes a web based client/server architecture, allowing the user to use the
system remotely without having to install additional software other than having access
to a web browser. The user test indicated that users found it to be quite usable, given
the high average usability estimate and positive feedback.
Although there exists systems oering similar functionality in terms of visualizing
proofs, point-and-click, and providing interactive theorem proving through a clien-
t/server system, there does not, as far as what this author is aware of, exist any other
systems that provide all these functions simultaneously. Pandora runs as a Java ap-
plet, so is completely run on the client-side. Furthermore, it does not use a standard
interactive theorem prover to verify proofs.
ProofWeb is distributed, responsive, utilizes a proper interactive theorem prover and has
the ability to render proofs. However it does not allow point-and-click proof creation,
and it currently only allows proof rendering to be done using text symbols rather than
using pixel-based graphics (making it slightly more dicult to read). Another advantage
our system has is that it is less coupled to the respective theorem prover as it uses the
PGIP protocol and the PGKit broker which are meant to be non-prover specic. The
87
ProofWeb system allows the use of a limited number of other interactive provers other
than Coq (e.g. Isabelle). However, few of the functionalities oered by ProofWeb
when using Coq are available with the others. Additionally, ProofWeb does not provide
a common interface to interact with theorem provers (no general protocol used for
interaction), relying instead on ad-hoc coding to utilize their functions. Our system is
not completely separated from coupling to Isabelle either in that it only understands and
generates commands in the Isabelle Isar language. However, changing to use another
theorem prover should theoretically not be too dicult as most of the Isabelle-dependent
functionality is at the web-application side (the XML parsing at the client side would
require changes as well, possibly only needing to disable this transformation it as other
provers can output results in XML). However, this relies on PGIP wrappers becoming
available for other interactive theorem provers.
7.3 Limitations
The developed system has a fair share of limitations as well. Unfortunately, the system
remains in a somewhat unstable state. This is mostly attributed to the PGKit broker
and the numerous problems experienced with using it (lack of functionality, frequent
failures, errors in functionality such as undoing objects as mentioned in Section 5.2.1).
The system also experiences some user interface quirks (e.g. disappearing data when
clicking outside a window, display panels overowing past its borders if too much content
as mentioned in Section 6.4.2). Other limitations that should be addressed if time
allowed are the lack of being able to use proven lemmas in proofs and the fact that the
system's error messages are not very informative.
7.4 Criticism
There are several design decisions taken that, in retrospect, might not have been the
best solutions. These will now be discussed.
7.4.1 General Architecture
The high level architecture could be simplied. The current architecture involves relying
on a relatively long chain of dependent components at the server side. Ideally, the system
should be streamlined to reduce overhead. One possible solution is to remove the need
for the PGKit broker, rolling its functionality into the web service, thus being able to
only have one node between the web client and the Isabelle theorem prover. This would
reduce latency, potential errors and keep it easier to maintain. Loose coupling could
still be maintained if the PGIP protocol was still to be used for interacting with the
underlying interactive theorem prover, as it is not prover-specic.
88
7.4.2 PGKit Broker
The poor state of the PGKit broker resulted in a wide range of work arounds having
to be devised in order to utilize the sought after functionality. If the full information
regarding the state of the PGKit broker had been available at the start of the project
the system design currently used might not have been chosen. The fact that few PGIP
commands are implemented in the PGKit broker, and that the system is unstable, are
issues that the PGKit authors need to address if they intend the broker to be used for
new proof editors.
7.4.3 Isabelle
It was very unfortunate that the XML output functionality of Isabelle was taken out.
Furthermore, David Aspinall's explanation as to why it was taken out was quite puz-
zling. One begs to question why it was not left in the system even though it was not
much used at the time.
Having the Isabelle results marked-up in XML by Isabelle itself would have made the
system less brittle, as the web service currently has to perform the mark-up based on
textual matching.
Furthermore, it is unfortunate that Isabelle does not utilize labelling of assumptions.
This would have made the system easier to implement and less brittle in terms of being
able to refer to labels when specifying rule premises rather than entering the whole
expression. Labelling would also be useful when creating the side explanation of a proof
step result.
7.4.4 Web Client
The rendering of proof results into HTML representation is currently quite process
intensive. It was felt that the code responsible for rendering could have be improved to
process quicker and thus make the system feel more reactive.
It is regrettable that the system does not work properly with the Internet Explorer web
browser. However, it is not alone in this (c.f. ProofWeb).
Furthermore, there are usability issues with the system's user interface that could benet
from more attention (i.e. make it more intuitive, perform more error checking on user
input, provide helpful error messages as to what part of the rule application failed etc.).
7.5 Outlook on Subject
Following is an outlook of possible changes occurring in the subject eld of user interfaces
for proof editors.
89
The ProofWeb project seems to have found a niche and has secured funding to continue
being developed for another 1 12 year. The system has expanded rapidly, and the latest
version seems promising as it allows showing both Fitch-style and Gentzen-style proof
visualization of proofs. The system, according to [31], has fully replaced the need for
other proof editors such as Proof General, even for expert use.
Pandora is a tried-and-tested system that has been fully embraced in teaching at Impe-
rial College London. However, it does not seem that the system has any strong points
that distinguishes it from other similar systems. It seems unlikely that the system will
be much adopted outside of that educational institution.
The Jape system has not undergone any large changes during the last few years. As
a result,we do not expect that any signicant new contributions to the eld will come
from this.
The Proof General for Eclipse project seems to be progressing with developing of the
system that is meant to replace the Proof General Emacs user interface [3]. It is the
author's opinion that the new system does not bring much new improvements in terms
of usability and functionality, thus it is unlikely that it will change the eld signicantly
However, an interesting change within the Proof General project is the recent release
of the PGIP 3.0 protocol [5]. The protocol has been simplied substantially to remove
unnecessary information and redundant functions. The new protocol is aimed towards
being used by stateless network clients. This shift in direction addresses some of the
problems experienced during the implementation of the system. However, there is no
word of any applications that are being developed to use this new protocol revision.
7.6 Future Work
If time permitted, there are several issues that could be improved in the system. The
most important improvements to the system would be to improve the stability by mod-
ifying the PGKit broker source code to cater for the critical errors experienced, to cater
for the Internet Explorer web browser and to further improve the usability of the user
interface. Furthermore, the system would benet from allowing users to reuse proven
lemmas in proofs. In a bigger picture, it would be desirable that the system supported
the creation of proofs using rules that lie outside of natural deduction.
Putting the system aside, there are several issues within the eld of interactive theorem
prover user interfaces that warrants more work. One area of potential study is that
of streamlining proof scripting in terms of reducing the dierence between scripts in
diering systems to allow users to jump between systems without having to learn the
system from scratch. Another interesting issue is that of creating user interfaces that
allow changing the way it renders the proof based on the type of theorem proving being
performed.
90
7.7 Final Remarks
The implemented system is in a usable state, and user test evaluations gave evidence
that it is successful in making it easier to perform proofs (removing need for prover
specic knowledge, removing the need to install software locally and the ability to
visualize proofs). It managed to successfully ll a small gap in the eld of proof editor
user interfaces that no other system (to the author's best knowledge) has properly
addressed.
The author's experience of undertaking this project was overall quite positive. Although
the system was less straight forward to implement than initially foreseen, a usable system
was successfully developed that managed to address all the aims and objectives outlined
for the project. There were times during the project when it seemed unlikely that all
the objectives could be met. However, solutions were found to all these hindrances.
On a more personal note, the author feels that it was rewarding to manage to undertake
and successfully complete such a relatively large project.
91
92
Bibliography
[1] Konstantine Arkoudas. Simplifying Proofs in Fitch-Style Natural Deduction Sys-
tems. Journal of Automated Reasoning, 34(3):239294, 2005.
[2] David Aspinall. Proof General. http://proofgeneral.inf.ed.ac.uk. Accessed
13. March 2007.
[3] David Aspinall. Proof General Eclipse. http://proofgeneral.inf.ed.ac.uk/
eclipse. Accessed 13. March 2007.
[4] David Aspinall. Proof General Kit - White Paper. http://homepages.inf.ed.
ac.uk/da/papers/drafts/white.pdf, July 2003. Accessed 13. March 2007.
[5] David Aspinall and Christoph Lüth. PGIP, the Proof General Interaction Proto-
col. http://proofgeneral.inf.ed.ac.uk/wiki/Main/PGIP. Accessed 10. August
2007.
[6] David Aspinall and Christoph Lüth. Proof General meets IsaWin: Combining Text-
Based and Graphical User Interfaces. Electronic Notes in Theoretical Computer
Science, 103:326, November 2004.
[7] David Aspinall and Christoph Lüth. Commentary on PGIP. http://
proofgeneral.inf.ed.ac.uk/Kit/docs/commentary.pdf, March 2007. Accessed
10. August 2007.
[8] Sebastian Bergmann. PHPUnit. http://www.phpunit.de. Accessed 13. July 2007.
[9] Yves Bertot, Ahmed Amerkad, Pascal Lequang, Loïc Pottier, and Laurence Rideau.
Pcoq: A java-based user-interface for Coq. http://www-sop.inria.fr/lemme/
pcoq. Accessed 13. March 2007.
[10] Richard Bornat. Natural Deduction Proof and Disproof in Jape.
http://jape.comlab.ox.ac.uk:8080/jape/DOCUMENTS/CURRENT/natural_
deduction_manual.pdf, March 2005. Accessed 13. March 2007.
[11] Richard Bornat and Bernard Surin. Jape. http://www.jape.org.uk. Accessed
13. March 2007.
93
[12] Richard Bornat and Bernard Sufrin. Animating formal proof at the surface: The
jape proof calculator. Computer Journal, 42(3):177192, 1999.
[13] Krysia Broda. Pandora Help - Rules. http://www.doc.ic.ac.uk/pandora/bin/
help/Rules/rules.html. Accessed 13. March 2007.
[14] Krysia Broda, Jiefei Ma, Gabrielle Sinnadurai, and Alex Summers. Friendly e-tutor
for Natural Deduction. In Proceedings of Teaching Formal Methods: Practice and
Experience (TFM 2006), London, UK, 2006.
[15] John Brooke. SUS - A quick and dirty usability scale. Digital Equipment Corpo-
ration, Ltd., 1986.
[16] Ewen Denney, John Power, and Konstantinos Tourlas. Hiproofs: A Hierarchical
Notion of Proof Tree. Electronic Notes in Theoretical Computer Science, 155:341
359, 2006.
[17] Bob Dougherty and Alex Wade. VisCheck. http://www.vischeck.com/. Accessed
20. August 2007.
[18] Tim Down. log4javascript - a JavaScript logging framework. http://www.timdown.
co.uk/log4javascript/. Accessed 13. July 2007.
[19] Jens Dørup, Michael Schacht Hansen, Lars Riisgaard Ribe, and Kristoer Larsen.
A comparison of technologies for database-driven websites for medical education.
Medical Informatics and the Internet in Medicine, 27(4):281289, 2002.
[20] Thomas Fuchs. script.aculo.us - web 2.0 javascript. http://script.aculo.us.
Accessed 13. March 2007.
[21] Erich Gamma and Kent Beck. JUnit. http://www.junit.org. Accessed 13. July
2007.
[22] Jesse James Garrett. Ajax: A New Approach to Web Applications. http://
www.adaptivepath.com/publications/essays/archives/000385.php, February
2005. Accessed 13. March 2007.
[23] Michael J. C. Gordon, Robin Milner, and Christopher P. Wadsworth. Edinburgh
LCF, volume 78 of Lecture Notes in Computer Science. Springer, 1979.
[24] Joe Hewitt. Firebug - Web Development Evolved. http://www.getfirebug.com/.
Accessed 13. July 2007.
[25] Edward Hieatt. JSUnit. http://www.jsunit.net/. Accessed 13. July 2007.
[26] Martin Homik and Andreas Meier. Designing a Proof GUI for Non-Experts
Evaluation of an Experiment. In Proceedings of the International Workshop on
94
User Interfaces for Theorem Provers (UITP 2005), pages 160178, Edinburgh,
Scotland, 2005.
[27] Michael Huth and Mark Ryan. Logic in computer science: modelling and reasoning
about systems. Cambridge University Press, New York, NY, USA, second edition,
2004.
[28] Jacques Fleuriot. Designing and Implementing an Interactive Natural Deduction
Proof Editor for Isabelle. http://homepages.inf.ed.ac.uk/alex/msc/project.
php?number=P014. Accessed 10. August 2007.
[29] Jakob Nielsen. Heuristics for User Interface Design. http://www.useit.com/
papers/heuristic/heuristic_list.html, 2005. Accessed 10. August 2007.
[30] Cezary Kaliszyk. Web Interfaces for Proof Assistants. In Proceedings of the Inter-
national Workshop on User Interfaces for Theorem Provers (UITP 2006), Seattle,
USA, 2006.
[31] Cezary Kaliszyk, Freek Wiedijk, Maxim Hendriks, and Femke van Raamsdonk.
Teaching logic using a state-of-the-art proof assistant. In Proceedings of the Inter-
national Workshop on Proof Assistants and Types in Education (PATE'07), pages
3346, Paris, France, June 2007.
[32] Loïc Pottier. LogiCoq. http://wims.unice.fr/wims/wims.cgi?module=U3/
logic/logicoq. Accessed 13. March 2007.
[33] National Center for Supercomputing Applications at the University of Illinois.
Common Gateway Interface. http://hoohoo.ncsa.uiuc.edu/cgi/. Accessed 10.
August 2007.
[34] Tobias Nipkow and Larry Paulson. Isabelle. http://isabelle.in.tum.de/. Ac-
cessed 10. August 2007.
[35] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL A
Proof Assistant for Higher-Order Logic, volume 2283 of Lecture Notes in Computer
Science. Springer, 2002.
[36] John K. Ousterhout. Scripting: Higher-level programming for the 21st century.
IEEE Computer, 31(3):2330, 1998.
[37] Lawrence C. Paulson. Isabelle: The next 700 theorem provers. In P. Odifreddi,
editor, Logic and Computer Science, pages 361386. Academic Press, 1990.
[38] Stuart Russell and Peter Norvig. Articial Intelligence: A Modern Approach.
Prentice-Hall, Englewood Clis, NJ, second edition, 2003.
95
[39] J. Siekmann, S. M. Hess, C. Benzmüller, L. Cheikhrouhou, D. Fehrer, A. Fiedler,
M. Kohlhase, K. Konrad, E. Melis, A. Meier, and V. Sorge. LΩUI: A Distributed
Graphical User Interface for the Interactive Proof System ΩMEGA. In Proceedings
of the International Workshop on User Interfaces for Theorem Provers (UITP-98),
Eindhoven, Netherlands, 1998.
[40] Sam Stephenson. Prototype js. http://www.prototypejs.org. Accessed 13.
March 2007.
[41] The Coq Development Team. The Coq proof assistant. http://coq.inria.fr/.
Accessed 13. July 2007.
[42] The HOL 4 Development Team. HOL 4 Kananaskis 4. http://hol.sourceforge.
net/. Accessed 13. July 2007.
[43] Norbert Völker. Thoughts on Requirements and Design Issues of User Interfaces for
Proof Assistants. Electronic Notes in Theoretical Computer Science, 103:139159,
2004.
[44] W3CWeb Services Architecture Working Group. Web Services Architecture. http:
//www.w3.org/TR/ws-arch/. Accessed 13. March 2007.
[45] Daniel Winterstein, David Aspinall, and Christoph Lüth. Proof General Kit. http:
//proofgeneral.inf.ed.ac.uk/Kit. Accessed 10. August 2007.
[46] Daniel Winterstein, David Aspinall, and Christoph Lüth. Parsing, Editing, Prov-
ing: The PGIP Display Protocol. In Proceedings of the International Workshop on
User Interfaces for Theorem Provers (UITP 2005), Edinburgh, Scotland, 2005.
[47] Daniel Winterstein, David Aspinall, and Christoph Lüth. Proof General / Eclipse:
A Generic Interface for Interactive Proof. In Proceedings of the International Joint
Conference on Articial Intelligence (IJCAI 2005), pages 15871588, 2005.
[48] World Wide Web Consortium. Cascading Style Sheets Level 2 Revision 1 (CSS
2.1) Specication. http://www.w3.org/TR/CSS21/. Accessed 10. August 2007.
[49] World Wide Web Consortium. Document Object Model (DOM) Level 3 Core
Specication. http://www.w3.org/TR/DOM-Level-3-Core/. Accessed 10. August
2007.
[50] World Wide Web Consortium. Extensible Markup Language (XML) 1.0. http:
//www.w3.org/TR/xml/. Accessed 10. August 2007.
[51] World Wide Web Consortium. HTTP - Hypertext Transfer Protocol. http://www.
w3.org/Protocols/. Accessed 10. August 2007.
[52] World Wide Web Consortium. XML Path Language (XPath). http://www.w3.
org/TR/xpath. Accessed 10. August 2007.
96
[53] World Wide Web Consortium DOM interest group. Document Object Model
(DOM). http://www.w3.org/DOM/. Accessed 10. August 2007.
[54] Yahoo! Inc. Yahoo UI! Library. http://developer.yahoo.com/yui. Accessed 13.
March 2007.
97
Appendix A
XML Messages
A.1 PGIP reply
The following PGIP message is a reply from the PGKit to the web client upon the client
requesting to add the command apply (rule_tac [1] conjI) to the proof script. Notice
the amount of information being sent and redundant object list operations performed.
1 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="
pd" refid="ws" refseq="7990" seq="7991">
2 <dispobjmsg >
3 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid="a275"
objposition="a26d" objtype="UnknownType" objstate="unparseable">
4 <unparseable >apply (rule_tac [1] conjI)
5 </unparseable >
6 </newobj >
7 </dispobjmsg >
8 </pgip>
9 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="
pd" seq="7993">
10 <proverstate proverid="/huh /16786/1187366719.888" provername="Isabelle
2005/ HOL" proverstate="busy"/>
11 </pgip>
12 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="
pd" refid="ws" refseq="7990" seq="7994">
13 <dispobjmsg >
14 <replaceobjs srcid="f20a" replacedfrom="a275" replacedto="a275">
15 <delobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=
"a275"/>
16 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=
"a277" objposition="a26d" objtype="ProofType" objstate="
parsed">
17 <proofstep >apply (rule_tac [1] conjI)</proofstep >
18 </newobj >
98
19 <newobj proverid="/huh /16786/1187366719.888" srcid="f20a" objid=
"a278" objposition="a26d" objtype="comment" objstate="parsed"
>
20 <whitespace >
21 </whitespace >
22 </newobj >
23 </replaceobjs >
24 </dispobjmsg >
25 </pgip>
26 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="
pd" refid="ws" refseq="7990" seq="7995">
27 <filestatus proverid="/huh /16786/1187366719.888" srcid="f20a" newstatus=
"changed" url="file: /// home/huh/theories/test3/remove.thy" datetime="
2007 -08 -19 T00:45:49Z"/>
28 </pgip>
29 <pgip tag="Broker" id="broker:ubuntu/huh /16785/2007717 -16519 -Z" class="
pd" seq="7996">
30 <proverstate proverid="/huh /16786/1187366719.888" provername="Isabelle
2005/ HOL" proverstate="ready"/>
31 </pgip>
A.2 PGIP state display vs. revised XML
The following PGIP message contains a state result returned from Isabelle (not properly
marked-up as processable XML).
1 <?xml version="1.0"?>
2 <pgip tag="Isabelle/Isar [broker]" id="/huh /6241/1187523268.329" class="
pd" refid="broker:ubuntu/huh /6240/2007719 -113428 -Z" refseq="1032" seq
="359">
3 <proofstate proverid="/huh /6241/1187523268.329">
4 <pgml>
5 <statedisplay >proof (prove): step 1
6
7 goal (2 subgoals):
8 1. (
9 <sym name="lbrakk">\<lbrakk></sym>(
10 <atom kind="free">A</atom>
11 <sym name="and">\<and></sym>
12 <atom kind="free">B</atom>); (
13 <atom kind="free">B</atom>
14 <sym name="and">\<and></sym>
15 <atom kind="free">A</atom>)
16 <sym name="rbrakk">\<rbrakk></sym>
17 <sym name="Longrightarrow">\<Longrightarrow></sym> (
18 <atom kind="free">A</atom>
19 <sym name="and">\<and></sym>
99
20 <atom kind="free">B</atom>))
21 2. (
22 <sym name="lbrakk">\<lbrakk></sym>(
23 <atom kind="free">A</atom>
24 <sym name="and">\<and></sym>
25 <atom kind="free">B</atom>); (
26 <atom kind="free">B</atom>
27 <sym name="and">\<and></sym>
28 <atom kind="free">A</atom>)
29 <sym name="rbrakk">\<rbrakk></sym>
30 <sym name="Longrightarrow">\<Longrightarrow></sym> (
31 <atom kind="free">B</atom>
32 <sym name="and">\<and></sym>
33 <atom kind="free">A</atom>))</statedisplay >
34 </pgml>
35 </proofstate >
36 </pgip>
The following message, generated by the web service upon receiving the above PGIP
message, shows the state result marked-up in processable XML.
1 <?xml version="1.0"?>
2 <body>
3 <result id="a76">
4 <tree step="1" subgoals="2">
5 <subgoal id="1">
6 <given >
7 <bracket >
8 <atom kind="free">A</atom>
9 <symbol name="and">and</symbol >
10 <atom kind="free">B</atom>
11 </bracket >
12 </given >
13 <given >
14 <bracket >
15 <atom kind="free">B</atom>
16 <symbol name="and">and</symbol >
17 <atom kind="free">A</atom>
18 </bracket >
19 </given >
20 <goal>
21 <bracket >
22 <atom kind="free">A</atom>
23 <symbol name="and">and</symbol >
24 <atom kind="free">B</atom>
25 </bracket >
26 </goal>
27 </subgoal >
28 <subgoal id="2">
29 <given >
100
30 <bracket >
31 <atom kind="free">A</atom>
32 <symbol name="and">and</symbol >
33 <atom kind="free">B</atom>
34 </bracket >
35 </given >
36 <given >
37 <bracket >
38 <atom kind="free">B</atom>
39 <symbol name="and">and</symbol >
40 <atom kind="free">A</atom>
41 </bracket >
42 </given >
43 <goal>
44 <bracket >
45 <atom kind="free">B</atom>
46 <symbol name="and">and</symbol >
47 <atom kind="free">A</atom>
48 </bracket >
49 </goal>
50 </subgoal >
51 </tree>
52 </result >
53 </body>
101
Appendix B
Natural Deduction Rules
These rule gures were all taken from Tobias Nipkow, Lawrence C. Paulson, and Markus
Wenzel. Isabelle/HOL - A Proof Assistant for Higher-Order Logic, volume 2283 of
Lecture Notes in Computer Science, Springer, 2002
∧i
→ i
∃i
∀i
= i
¬i
∨i1
102
∨ι2
classical
∧ε1
∧ε2
∀ε
∃ε
∨ε
= ε1
= ε2
¬ε
→ ε
PBC
LEM
103
Appendix C
Questionnaire
104
Questionnaire for web based, natural deduction proof editor.
The system to be tested is being developed as my MSc summer project. It is meant as a tool for creating formal proofs in Natural Deduction, and is mainly aimed towards novices and non-experts. The system is meant to address the following:
1. Make it easier to visualize proofs.
2. Remove the need for specific knowledge about the underlying interactive theorem prover.
3. Make theorem proving more available in terms of removing the need to install software locally.
The aim of this user test is to evaluate the usability and appropriateness of the system, its stability, and the design of the user interface.
The user test requires you to perform some simple Natural Deduction theorem proving using the web application, and should not take more than 10-15 minutes of your time. In order to take part in this test, you must have access to the Mozilla Firefox web browser (1.5+, 2.0+), and be comfortable with first-order logic.
The data acquired in this questionnaire will be kept anonymous. No personal details will be asked for. If you have any questions, do not hesitate to contact me:
Jonas Halvorsen
Part 1:1. On a scale from 1-5, how confident do you feel with logic and performing natural deduction
proofs.
2. Do you know how to use Fitch-style / Box-style notation of natural deduction proofs?
Note: if you don't know this notation style, you might want to take a quick look at http://www.danielclemente.com/logica/dn.en-node15.html for a short introduction to Fitch-style notation.
3. On a scale from 1-5, how comfortable are you with performing pen-and-paper natural deduction proofs?
4. Do you have any experience with using an interactive theorem prover? If so, list the systems that you have used.
5. Look at the following linear natural deduction proofs. Can you follow them?
[1]
[2]
6. Now look at the proofs below. Can you follow them?
[3]
[4]
7. Which of the two proof styles (question 5 style vs. question 6 style ) did you find easiest to follow? Why?
Part 2:1. Load up the BoxProvr webpage in the web browser, and log in. The web-address and the
login name you have been allocated is written on the first page of this questionnaire.
2. You will now see the main screen without a script loaded. On the left, a menu has appeared with the following sub-headings:
'File': File actions such as save, load, new file.'View': Actions related to what and how information is displayed.'Script': Non-proof step script actions, such as creating a new theorem to prove.'Forward': Forward applied rules available.'Backward': Backward applied rules available.'Isabelle': Specific Isabelle rules available that usually don't appear in logic books.
3. Go to 'File->Open File'.a) Pick the file called demo.thy.
You will now see three proofs. Two of them are coloured green to indicate that they are closed proofs. The remaining one is grey, indicating that it is not finished.
4. Go to the 'Forward' menu, and place the mouse cursor over a button. After a short time period a box will appear with an explanation of the rule. All the buttons in the rule application menus have clue-tips.
5. Complete the unfinished proof by: a) Clicking on the empty dotted box to select the subgoal to work on.b) Then click on the assumption labeled '1'.c) Now click on the '→E' button in the 'Forward' menu. You will now see that the rule has
been applied.d) Finish the subgoal by again clicking the subgoal, and then directly click the 'Ass' button
(without selecting any assumptions). The box will now turn green and close, indicating that the proof is completed.
At any time, you can right-click on a line arrived at by a rule application, and press 'undo' in the menu that appears to undo the proof up to that step.
6. Now press the 'File->New File' button. Enter the filename: 'one'. A new script will be created, called 'one.thy'.
7. Press Script->Add Theorem to add a new theorem.a) Enter the text 'one' into the 'name' box.b) Enter A; B into the 'Assumptions' box. This will add A and B as two separate givens to
the sequent to prove (note: this box can be empty if the sequent to prove does not rely on any assumptions).
c) In the 'Goal' box, enter the character 'A', then press the '/\' button and finally enter the ' B' character (you can also enter '(' and ')' to make the proof easier to read).
d) Press 'Proceed'. The theorem will now appear as a new thing to prove in the script.
8. Select the subgoal as before. Press the 'Backward->/\I' button. Now close both the subgoals by the regular step of selecting the subgoal and applying 'Ass'.
Note: you can, at any time, view the Isabelle proof script by 'Script->View Script'.
Part 3:Now that you know the basics, try to perform 2-3 of the following proofs. Write down any problems you encounter.
1. A B → B A∧ ∧2. (A→ ( ¬B)) = (B→ ( ¬A))3. (¬ ( x. P x))→( x. (¬ (P x)))∀ ∃4. ¬ ( x. ( P x)) = x. ( ¬ ( P x))∀ ∃5. y. ( x. ( P x y)) x. ( y. ( P x y))∃ ∀ ∀ ∃6. ( x. ( A→ ( B x))) → (A→ ( x. ( B x)))∀ ∀7. ( x. ( P x)) ( x. ( Q x)) x. (( P x) ( Q x))∃ ∨ ∃ ∃ ∨
Part 4:
1. Did you manage to use the system? If not, what was the problem?
2. Were there any errors that appeared that prevented you from performing proofs?
3. Do you feel that the system aids in performing natural deduction proofs, in terms of proof representation and not needing knowledge about the prover syntax?
4. Were there any aspects that seemed to go against what you associate with box-proofs?
5. Do you feel that the system would be useful for a novice user, as a tool to perform formal proofs in natural deduction?
6. Would the system be of any use to you personally? Why?
Part 5:
System Usability Scale
© Digital Equipment Corporation, 1986. (released to public domain)
Strongly Strongly disagree agree
1. I think that I would like to use this system frequently
2. I found the system unnecessarily complex
3. I thought the system was easy to use
4. I think that I would need the support of a technical person to be able to use this system
5. I found the various functions in this system were well integrated
6. I thought there was too much inconsistency in this system
7. I would imagine that most people would learn to use this system very quickly
8. I found the system very cumbersome to use
9. I felt very confident using the system
10. I needed to learn a lot of things before I could get going with this system
[5]
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Part 6:
1. Note two things that you liked about the system:
2. Note two things you disliked about the system.
3. Any other comments?
Thank you for your time![1] Allen, C., & Hand, M. (2001). Logic Primer. Cambridge: MIT Press. P169
[2] Allen, C., & Hand, M. (2001). Logic Primer. Cambridge: MIT Press. P119
[3] Hodkinson, Ian (2006). 140 Logic., [http://www.doc.ic.ac.uk/~imh/teaching/140_logic/140.pdf]
[4] Huth, M., & Ryan, M. (2004). Logic in Computer Science. Cambridge: Cambridge University Press. P118
[5] Brooke, J. (1986). SUS - A quick and dirty usability scale, Digital Equipment Corporation, Ltd.