ONTOLOGICAL REPRESENTATION OF RADIATION TREATMENT

ONTOLOGICAL REPRESENTATION OF RADIATION TREATMENT DATA

BY

THOMAS M. MINTA

A Thesis Submitted to the Graduate Faculty of

WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES

in Partial Fulfillment of the Requirements

for the Degree of

MASTER OF SCIENCE

Computer Science

August 2011

Winston-Salem, North Carolina

Approved By:

Yaorong Ge, Ph.D., Advisor

Errin W. Fulp, Ph.D., Chair

William H. Turkett, Ph.D.

Acknowledgments

Dr. Yaorong Ge, Dr. Errin Fulp, and Dr. William Turkett,

Your support, advice, and encouragement over the years has been truly remarkable.Year after year, you’ve been incredible references for all that I’ve needed.

Wake Forest University Department of Computer Science

I couldn’t imagine a more supportive department, both faculty and students alike. Isincerely thank everyone who has been involved in my studies in one way or another.

My friends and family

I owe a tremendous amount to all who have provided me guidance and directionnot only in my studies, but in life.

ii

Table of Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Radiation Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Current Approach to Radiation Treatment Planning . . . . . . . . . . 2

1.3 The Need for a New Treatment Planning Approach . . . . . . . . . . 3

Chapter 2 Semantic Web and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Reasoning and Inferring . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 RDF: Resource Description Framework . . . . . . . . . . . . . . . . . 11

2.4 OWL: Web Ontology Language . . . . . . . . . . . . . . . . . . . . . 14

2.5 Applicability of Semantic Web Principles for Treatment Planning . . 15

2.6 Comparing Ontologies to Relational Databases . . . . . . . . . . . . . 15

Chapter 3 Radiation Treatment Guideline Ontology. . . . . . . . . . . . . . . . 17

3.1 Radiation Treatment Resources . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Existing High-Level Ontologies . . . . . . . . . . . . . . . . . 18

3.2 Concept Extraction from QUANTEC Papers . . . . . . . . . . . . . . 19

3.3 Challenges Building the structure of RTGO . . . . . . . . . . . . . . 20

Chapter 4 RTGO GUI Plug-In Development . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Protege Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Developing Plug-Ins for Protege . . . . . . . . . . . . . . . . . . . . . 23

4.3 RTGO GUI Plug-In . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Dose-Volume Relationships . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4.1 Challenges Defining Organ Volume . . . . . . . . . . . . . . . 27

4.4.2 Challenges Defining Radiation Doses . . . . . . . . . . . . . . 28

4.4.3 Challenges Defining Side Effects of Radiation . . . . . . . . . 29

4.4.4 CTCAE: Common Terminology Criteria for Adverse Events . 30

iii

4.4.5 Acute vs. Late Endpoints . . . . . . . . . . . . . . . . . . . . 30

4.5 Features of RTGO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5.2 Additional Study Details . . . . . . . . . . . . . . . . . . . . . 33

4.5.3 Direct Access to Abstracts and Full Papers . . . . . . . . . . . 33

Chapter 5 Software Engineering Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Software Engineering Dimensions . . . . . . . . . . . . . . . . . . . . 35

5.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 ISO 9126 Analysis of RTGO . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Software Engineering Analysis Results . . . . . . . . . . . . . . . . . 38

Chapter 6 Conclusion and Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Discussion of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.2 Ontologies in Other Disciplines . . . . . . . . . . . . . . . . . . . . . 42

6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.1 Additional Filters . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.2 Integration With Journals for Full Text Access . . . . . . . . . 44

6.3.3 Integration With Existing High Level Ontologies . . . . . . . . 44

6.4 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

iv

List of Figures

2.1 Basic Family Member Ontology Structure . . . . . . . . . . . . . . . 7

2.2 More Detailed Family Member Ontology Structure . . . . . . . . . . 8

2.3 Family Member Ontology Structure with Relationships . . . . . . . . 9

2.4 Inferred Structure of Family Member Ontology . . . . . . . . . . . . . 10

3.1 Extraction of Concepts from QUANTEC Papers . . . . . . . . . . . . 20

3.2 Graphical Representation of Top Two Layers of RTGO . . . . . . . . 21

4.1 Protege Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Protege Interface Editing Class Hierarchy of RTGO . . . . . . . . . . 25

4.3 Accessing the RTGO Plug-in from the Protege Interface . . . . . . . 25

4.4 RTGO Plug-in Interface . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 Dose-Volume Histogram for Radiation Effects on the Rectum [17] . . 27

4.6 CTCAE Version 4.0 Classification of Gastrointestinal Adverse Events [7] 31

4.7 Link to View Abstracts and Full Text of Publications . . . . . . . . . 34

5.1 Results of ontology evaluation. Four series represent the individualevaluator ratings. The fifth series represent the average of the ratings 39

v

Abstract

Thomas M. Minta

Radiation treatment, or the use of powerful electromagnetic waves to treat malignantgrowths in the body, has become very common in the treatment of cancer patients.The dangers of radiation, however, can be catastrophic if not utilized with the utmostcaution; thus, meticulous treatment plans are absolutely essential. Currently, theprimary resources available to physicians for planning the specific doses of radiationto be prescribed are primarily found in published literature of previous studies. Asmerely textual resources, however, these publications do not lend themselves well toany sort of reasoning, computation, or specialization. Consequently, the knowledgecontained within these papers is not being captured in the most usable format. Toremedy this, a different approach to the extraction of this knowledge is necessary,coupled with a new model for representation of the knowledge in order to provide fordata that is both more efficient and accessible.

An ontology, which can be thought of as a class-based system that builds a web ofconcepts interconnected by relational properties, lends itself well to this purpose; thephysical (and non-physical) entities involved with radiation treatment planning existas classes, and relationships between these classes are defined by properties. Since anontology is similar to a database, storing data and relationships between the data, itprovides the ability for a constantly growing source of knowledge. Rather than static,independent sources of data, new data is entered into an existing ontology, whichimmediately can perform calculations and inferencing on the new data. This resultsin a dynamic knowledge representation model that can continually provide the mostrelevant data possible. To demonstrate and explore the benefits of an ontologicalrepresentation of knowledge, the Radiation Treatment Guideline Ontology (RTGO)takes advantage of this new approach. RTGO shows how the knowledge containedwithin the text of existing publications can be modeled in a way that allows for bothcomputation and specialization when utilized. Using established software engineeringcriteria, we analyze the correctness, efficiency, and scalability of this approach. Inaddition, we explore the potential applicability of similar ontological knowledge rep-resentation in other disciplines. Ultimately, RTGO shows that building an accurate,efficient, and scalable ontology to represent the knowledge contained within radiationtreatment guideline papers provides access to the data in ways that were previouslymuch more difficult, if not impossible.

vi

Chapter 1: Introduction

The power of computation has made truly remarkable gains in the past half-

century. Computer systems today are capable of calculating inordinate amounts of

numerical data at lightning fast speeds. From credit card transaction processing

to airline reservation systems, our world has developed capabilities that bring an

incredible amount of data to our fingertips. When it comes to the storage and use of

non-numeric knowledge, however, progress is not nearly as pronounced. For example,

we have computer systems that can process millions of medical records to analyze any

number of statistics in a patient’s health history, but when it comes to “calculating”

how that history translates to subsequent knowledge, such as a diagnostic plan, our

current abilities are not nearly as accomplished.

1.1 Radiation Treatments

The use of extremely strong waves of energy, commonly known as radiation, to treat

cancerous growths has revolutionized medicine. The ultimate goal of using radiation

is to destroy malignant cells, which are defined as cells that replicate with no regard to

responses from surrounding cells to suspend replication. While normal cells recognize

these signals to cease replication coming from other nearby cells, malignant cells

ignore the signals and can grow into cancerous masses. Radiation is capable of killing

these cells that grow uninhibitedly, but it also is equally as capable of killing normally

functioning cells, such as cells in vital organs like the heart or lungs.

External beam radiation therapy (EBRT), which was first introduced in the 1950s,

utilizes beams of radiation that can be aimed at specific areas of the body [3]. As

a result of the power and potential dangers of radiation treatment, meticulous plan-

1

ning of the implementation of radiation treatments is absolutely critical. Too much

exposure of any cell to radiation can kill it, but determining how much is too much

depends on a myriad of factors.

1.2 Current Approach to Radiation Treatment Plan-

ning

In order to develop appropriate treatment plans, radiation oncologists utilize a num-

ber of different resources, such as clinical practice guidelines, peer-reviewed research

studies, and medical imagery like MRIs or CT Scans [19]. Cells that are functioning

properly (i.e. not part of a malignant growth) make up normal tissue; damage to

this normal tissue should ideally be minimized. With all the external resources, the

physician’s goal is to develop a treatment plan that will minimize these normal tissue

effects, while maximizing damage to the malignant cells. Unfortunately, achieving

this goal is complicated by the fact that different types of tissue in the body (i.e.

muscle tissue, lung tissue, or bladder tissue) react very differently to radiation [4].

As a result of the complexities involved with radiation treatment, the development

of a successful radiation treatment plan must take into consideration all types of

tissue surrounding the cancerous growth. Many studies on radiation treatment effects

have been published (examples include [16, 17, 27]), and these publications are very

beneficial to physicians in treatment planning. Looking forward, however, the data

could be harnessed in a much more useful format. Two big drawbacks of the data in

its present form are (1) the lack of computability – support for automated processes

of reasoning, manipulation, and analysis and (2) the difficulty of specialization – the

ability to directly apply existing data to specific cases.

Publications on normal tissue effects of radiation treatment require significant

time to accurately conduct a study to produce the data – often 5 to 7 years or more

2

[17]. The results of these studies are commonly presented as recommended doses.

At this point, however, the quantitative analysis does not progress much further.

Once the paper has been published, it is used as a reference, but it is a static piece of

information. As a result, this data contained within literature, being represented only

in narrative text, does not lend itself well to computability; any type of comparing,

contrasting, inferring, or drawing conclusions on the data is a manual process.

In addition to the lack of computability, existing data does not allow for easy

specialization. If a patient, for example, is elderly with emphysema, his lung tissue

will likely experience very different effects than a young, healthy patient. Of course,

physicians are not blindly following generalizations, and they are fully capable of

making adjustments based on experience and other sources of knowledge. Modeling

the data in a manner that allows for more detailed specialization, however, would

help tailor treatments to the individual characteristics of the patient at hand.

1.3 The Need for a New Treatment Planning Ap-

proach

A simple solution might appear to make a push for more literature to be published

with more specific cases, allowing for a broader reference base, but published litera-

ture in its very nature can be cumbersome and is not always accessible. Taking the

approach of a broader reference base naturally leads to the physician needing to sift

through more papers in an attempt to find data relevant to the case at hand. Referring

to any discipline, Allenmang describes the desire to make data “smarter,” meaning

that its representation does not just exist to be referred to later and manually used for

calculations [2]. Rather, the data should be automatically available to other sources

of information and provide the ability for intelligent conclusions to be drawn, without

the need for a person to consciously seek out and apply the data. In order for this to

3

occur, a new model is needed for storing the data [2]. In the medical field, the neces-

sity for “disciplined modeling” of concepts [20] has become increasingly pronounced

[22]. By disciplined, we are referring to capturing the data in a more structured, scal-

able, and accessible format as opposed to independent publication after publication

containing redundant information.

In order to investigate the application of a more disciplined model to radiation

treatment planning, this thesis explores a new approach – an ontological represen-

tation of the data. Since textual representation of information leads itself to to

ambiguity in terminology (i.e. “George Bush” could refer to several different indi-

viduals), it is necessary to explicitly clarify what the term at hand means. Through

the exploration of an ontological representation of data, this thesis demonstrates how

building an accurate, efficient, and scalable ontology to represent the knowledge con-

tained within radiation treatment guideline papers provides access to the data in ways

that were previously much more difficult, if not impossible. Chapter 2 will introduce

ontologies, the proposed knowledge structure in which radiation treatment data can

be stored and discusses the benefits that this structure provides. Chapter 3 presents

the Radiation Treatment Guideline Ontology (RTGO) that was built to model the

data at hand. Chapter 4 introduces the Protege software used for development and

the interface built on top of the ontology. Chapter 5 provides analysis of the tool

developed, and Chapter 6 offers concluding remarks on RTGO and where the tool

can lead in the future.

4

Chapter 2: Semantic Web and Ontologies

As the name implies, the Semantic Web is a developing technology focused on

representing knowledge in a connected web, with the construction focusing on the

“semantics,” or meaning, of the knowledge. By storing information in this type of

web, based on a structure that reflects the actual entity in the world being represented,

relationships between the data can be better understood [2]. Using the Semantic

Web approach, the needs of computability and specialization of data can be much

better achieved. In the field of radiation treatment planning, this allows previous

publications to serve as calculable sources of data. Ideally, with more useful sources

of data, better treatment planning will follow. Although the semantic web approach

may initially seem a bit abstract, it is much more easily understood when considering

real-world example situations.

2.1 The Semantic Web

In a conversation, or piece of literature, the name “Thomas Jefferson” might likely

refer to the third president of the United States of America. On the other hand,

it could also refer to the American Dixieland jazz trumpeter born in 1920, of the

same name. While existing models of knowledge representation (specifically, the

narrative text of published literature) lend themselves to this sort of ambiguity, the

Semantic Web provides a framework for linking a term, such as “Thomas Jefferson,”

to the actual entity in the world to which the term refers. By providing relationships

between entities (such as having a certain ID number, or being married to another

individual, or being located at a certain location), the Semantic Web’s representation

of data allows for inferences to be drawn. In other words, the Semantic Web helps

5

model data in such a way that it can be calculable.

The structure of the Semantic Web’s representation of data can be thought of

as similar to an object-oriented programming approach. Objects in the world are

represented as a class, such as a Person or a President. Classes can be instantiated

as individuals, such as Thomas Jefferson. Properties can be defined to provide rela-

tionships between classes or individuals. One key aspect, however, is the support for

multiple inheritance - an individual may be a member of multiple classes. The indi-

vidual, Thomas Jefferson can be defined as a President, as well as a Husband and

a Father. By saying that Thomas Jefferson is a member of the class Husband the

individual can automatically possess the relational property, hasSpouse. Likewise,

since Thomas Jefferson is also a Father, it is known that he has at least one child

(assuming that having a child is our definition of a father). This approach begins

to unlock the true power of the Semantic Web structure, in that class membership

becomes akin to set theory.

When developing an ontology, however, it is important to consider the level of

abstraction necessary to meet the needs of the task at hand. Implementing too few

concepts as classes results in repetition of information in individuals. For example,

if the most detailed class implemented was a Computer, individuals in the ontol-

ogy could include DellLaptop, HPLaptop, MacintoshLaptop as well as DellDesktop,

HPDesktop, and MacintoshDesktop. The repetition in these individuals could have

been solved by creating the subclasses Laptop and Desktop. Implementing too many

classes, however, does not allow for the ontology’s data set (i.e. number of individu-

als) to grow without modifying the class structure. If separate classes were created

for DellLaptop, HPLaptop, MacintoshLaptop, etc. then the class structure would

need to be changed for a ToshibaLaptop to be represented, as opposed to a simple

instantiation of Laptop with a property to identify the manufacturer. As a result,

6

the Semantic Web provides for the ability to make an efficient and scalable knowledge

structure, but it is crucial to be very cognizant of the level of abstraction necessary.

2.2 Reasoning and Inferring

Once data is represented by sets, the possibility of applying reasoning to the data

allows the previously static data to become computable. By “computable,” we mean

that automatic reasoning and drawing of inferences becomes possible. The funda-

mentals of the Semantic Web can be turned into a concrete knowledge representation

called an ontology. As an example, an ontology can be constructed to represent the

members of a family. For a basic Family Member Ontology, a top-level class of Person

is defined. The basic connection between a class and its corresponding subclasses is

the relationship is-a. To represent a family, a FamilyMember can be created as a

sub-class of Person, because every FamilyMember “is a” Person. The FamilyMember

class can contain the subclasses Parent and Child, as shown in Figure 2.1 (note that

all concepts in an ontology automatically belong to the highest level class of Thing).

Figure 2.1: Basic Family Member Ontology Structure

Next, to add additional detail to the ontology, a FamilyMember could be identified

more specifically as a Father, Mother, Son, or Daughter. With these more specific

classes, the concept of gender is introduced. Figure 2.2 shows an expanded ontology,

including these new concepts in addition to the previous generic concepts.

7

Figure 2.2: More Detailed Family Member Ontology Structure

It becomes clear, however, that Father and Mother really belong as sub-classes of

Parent, and that Son and Daughter belonged as sub-classes of Child. Although we

could manually fix this in an example this trivial, allowing the ontology to perform

reasoning on this trivial example demonstrates the possibilities for more complex

ontologies.

In order for the the new structure of the ontology to be inferred, relationships must

be defined between classes. To provide relationships between the family members, the

properties isParentOf and isChild of can be defined. Naturally, a Child isChildOf

a certain Parent and vice versa. The following definitions can be established in the

ontology:

• A Child isChildOf some Parent

8

• A Parent isParentOf some Child

• A Father isParentOf some Child and hasGender Male

• A Mother isParentOf some Child and hasGender Female

• A Son isChildOf some Parent and hasGender Male

• A Daughter isChildOf some Parent and hasGender Female

With all these relationships defined, the ontology view begins to grow rather

complex rather quickly, as seen in Figure 2.3

Figure 2.3: Family Member Ontology Structure with Relationships

Once the asserted relationships have been defined in the ontology, reasoning can

be performed to infer additional relationships. Figure 2.4 shows the new inferred

structure of the ontology (without the isChildOf and isParentOf relationships, for

clarity).

9

Figure 2.4: Inferred Structure of Family Member Ontology

With its reasoning capabilities, the ontology inferred the correct classification

of Son and Daughter each as a Child, as well as Father and Mother each as a

Parent. As discussed above, it was likely clear from the beginning that Father and

Mother belonged as sub-classes of Parent, and that Son and Daughter belonged

as sub-classes of Child. To demonstrate the reasoning capabilities however, the

structure was created naively (with Father, Mother, Son, and Daughter simply as

generic FamilyMembers), allowing the ontology to infer the appropriate structure for

us. In a more complicated setting, however, such as determining the appropriate

representation of a person’s “stomach contents” within an anatomical class structure,

these inferencing capabilities makes the model extremely powerful.

The power of ontologies goes far beyond simply inferring class structure (i.e. in-

ferring that Son and Daughter are actually subclasses of Child). Given that classes

in an ontology are akin to mathematical sets, properties such as transitivity, reflexiv-

ity, or inversion can be utilized to perform additional computation. For example, an

individual William might be connected to another individual Paul by the property

10

isSonOf: William isSonOf Paul. The property isSonOf can be identified as the

inverse of another property, isParentOf. Thus, by stating, William isSonOf Paul,

it can automatically be inferred that Paul isParentOf William. Furthermore, if

Paul also possessed the property hasGender Male, we can go beyond knowing that

Paul isParentOf William; the additional gender property allows us to infer that

Paul isFatherOf William, simply by stating (1) that William is Paul’s child, and

(2) Paul is a male. Additionally, it could be inferred that William is also a Male

given that he is a son, and being a male, we can know that he has two distinct (X

and Y) sex chromosomes, and so on.

Applying these types of inference capabilities to the radiation treatment planning

domain makes it possible to infer relationships that may not have previously been

discovered. The transitivity property, for example, can allow for new correlations to

be discovered as a result of representing the knowledge in a Semantic Web format.

For example, an elevated white blood cell count might never have been previously

considered related to a particular disease. While it is certainly known that correlation

does not always prove causation – and this thesis is not presented as a method of

blindly applying the property of transitivity – this new model provides the framework

for this type of analysis to begin to be investigated.

2.3 RDF: Resource Description Framework

In the full scope of the Internet, the World Wide Web Consortium (W3C) has agreed

on various specifications for implementing the Semantic Web ideals. These specifica-

tions are contained within a framework known as the Resource Description Framework

(RDF). RDF has become the accepted framework standard, which defines Uniform

Resource Identifiers (URIs) to uniquely identify a certain concept or term. For URIs

to eliminate the ambiguity of basic text, a reference is not made to “Thomas Jeffer-

11

son.” Rather, RDF borrows from existing web technology; the syntax of a URI is

very similar to that of a URL [2]. For example, a URI might exist as

http://www.organization-a.org/Presidents.owl#ThomasJefferson

In the preceding example, the file “Presidents.owl” might be consolidated informa-

tion about Presidents published as an ontology by Organization A. If the information

in this ontology was found to be useful, accurate, or beneficial, Organization B could

“link” to this source. Unlike traditional hyperlinks to URLs found on the current

World Wide Web, however, the link is not a destination that must be visited in order

to access its data. Rather, the link can be thought of a a process of embedding the

data in the active publication. Thus, Organization B has direct access to the infor-

mation published by Organization A. If Organization A discovers more data about

a President and loads it into their source, Organization B’s page can automatically

reflect those changes, since the data source is linked. Of course, the issue of trust

naturally becomes an issue. Organization B must trust Organization A to maintain a

reliable and accurate source of information. This trust issue, however, is no different

from the current World Wide Web. Any web site is free to link to any other web

site, and it is the responsibility of the originating site to determine the validity of the

destination site. Understandably, the National Institute of Health would likely not

link to a middle school student’s article on cancer as a reputable source, although the

NIH is certainly free to do so. The Semantic Web allows for the same freedom, based

on the same issue of trust and reputability.

As a more realistic example, a popular restaurant chain might have an information

source with each of its locations. A local city’s website might be interested in listing

restaurants available in the city. If the restaurant chain opens an additional location

in the city, a disconnect between information will occur unless the city webmaster

actively updates the website with the new location. With a Semantic Web model,

12

however, if the city were able to access the URI of the restaurant chain’s locations, the

information on the city website would always remain accurate with the restaurant’s

records. Translating this example to the process of radiation treatment planning,

the presence of new study data can be included in the knowledge base without any

repetition of previously defined concepts. If desired, calculations can begin including

this data immediately, without the need for a physician or organization to manually

locate and obtain the new study results. This automatic synchronization is what

contributes to the definition of data being “smarter” than static data that can easily

become out-of-date or inaccurate.

It becomes clear that the use of Semantic Web modeling strongly encourages the

sharing and collaboration of data; however, it still allows for the freedom for anyone

to say anything about anything, which is considered to be an absolute necessity for

the World Wide Web to function as it does today [2]. In the example above, if

Organization A had published inaccurate information about presidents, nothing is

stopping Organization B from publishing their own data about Presidents. Then,

as time goes on, the data found to be the most useful, which is likely the most

accurate and complete, will naturally find itself linked to more often, thus aiding in

the collaborative construction of accurate datasets [2].

Creating smarter and more useful data through linking URIs is one substantial

benefit of the Semantic Web, but it is far from the only benefit. As introduced in

Section 2.2, the data being stored starts to become computable. Multiple inheritance

allows an individual to possess properties from several different classes. Defining

relationships with other classes allows for new connections to be inferred through

mathematical properties such as set intersection, union, or transitivity. A variety of

knowledge representation languages have been developed based on the RDF construct,

with varying degrees of capabilities. The basic RDF Schema (RDFS) has a relatively

13

limited set of properties with which to use, such as Domain, Range, and Member.

RDFS-Plus expands upon these properties with set unions and intersections. The

Web Ontology Language (OWL) is an even more powerful language, adding additional

properties such as “someValuesFrom” or “allValuesFrom.” Because of the powerful

inference capabilities of OWL, it is currently used very widely in the development of

Semantic Web knowledge bases [2].

2.4 OWL: Web Ontology Language

OWL provides developers with way to represent knowledge in an extremely useful

format – much more useful than typical narrative text. With access to the full set

theory language (intersections, unions, complements, etc), classes can be built by

defining restrictions on membership, such as “States that border an ocean,” or “Cells

that participate in striated muscle contraction.” With this type of set inclusion (or

exclusion), knowledge begins to become computable. With data being entered into

a Semantic Web model, new analyses can be performed simply by manipulating the

restrictions on various classes.

In addition to the application of set theory concepts, OWL also provides the

ability to define cardinality restrictions on the data, such as “States that border

more than 3 other states,” or “Radiation treatments causing ulceration lasting longer

than 7 days.” With cardinality restrictions, OWL provides a system that can offer

specialization in retrieving relevant data. The need to sift through literature for data

relating to a specific patient, such as an infant, can essentially be eliminated; a query

to the knowledge base can be limited by practically any factor desired. OWL served as

the foundational language for building the Radiation Treatment Guideline Ontology

(RTGO), as discussed in Chapter 4.

14

2.5 Applicability of Semantic Web Principles for

Treatment Planning

By representing data within the Semantic Web, a model that defines entities and

their relationships, as opposed to static literary publications, it not only provides

more direct and comprehensive access to data, but also offers the ability to uncover

new conclusions or relationships that previous knowledge representations could never

produce. In the domain of radiation treatment planning, this allows a system to

be developed that can allow more relevant and specific study data to be recorded,

such as the relationships between radiation on lung tissue on a patient suffering from

emphysema. Once this framework is in place, the knowledge stored in an ontology

(1) becomes computable, with new correlations to be discovered between patient

characteristics previously thought to be unrelated, and (2) enables specialization,

with relationships allowing for the extraction of more relevant data.

2.6 Comparing Ontologies to Relational Databases

Many comparisons are often drawn between ontologies and relational databases. Con-

ceptually, the two are relatively similar: both provide for a way to store data in an

organized format and define relationships between the data. The composition of an

ontology, however, can be viewed as “semi-structured natural language texts” as op-

posed to simply tabular databases [11]. The statement from the example above, Paul

isFatherOf William demonstrates this semi-structured text. In a way, an ontology

can be viewed as a data model that provides an interface to a a data source, such

as a relational database. Ontologies, however, do not simply store data and pro-

vide explicitly defined relationships between the data. Rather, ontologies allow for

queries that can, “reason about the asserted facts and retrieve new facts implied by

15

the known facts” [11]. The inherent ability for re-use and interoperability also makes

a very strong case for ontologies over relational databases. Databases are frequently

developed by organizations for internal purposes and are not geared towards the reuse

of data in the future. Since the nature of ontologies is based on representing concepts

in terms of their actual meaning, it is much easier to consider the possibility of reusing

part or all of an existing ontology.

16

Chapter 3: Radiation Treatment Guideline Ontology

The ontology developed in this thesis is the Radiation Treatment Guideline Ontol-

ogy (RTGO), which was based off studies aimed at obtaining data on the side-effects

of radiation treatments [4, 17, 27]. These studies have been published in various jour-

nals, and are often focused on the effects when treating a specific type of cancer, such

as prostate cancer or bladder cancer. An ontology was used to store all the concepts

contained within these papers to reap the benefits of modeling data in this structure,

as discussed in Chapter 2.

3.1 Radiation Treatment Resources

As a starting point for determining the structure of RTGO, several literary resources

were used. While many papers have been published with results from studies on

the effects of radiation treatments, there are organizations that consolidate these

studies and publish summaries. The American Association of Physicists in Medicine

produces publications labeled as Quantitative Analysis of Normal Tissue Effects in

the Clinic (QUANTEC). Many organ-specific QUANTEC papers have been produced,

providing summaries of radiation treatment studies on a specific organ. These papers

are specifically focused on dose-volume effects, which analyze the maximum dose

recommended that a certain volume of the organ should be exposed to. To build

RTGO and begin a structure allowing for knowledge computability and specialization,

two organ-specific QUANTEC papers were obtained, analyzing dose-volume effects

on the bladder and on the rectum [17, 27].

17

3.1.1 Existing High-Level Ontologies

One of the great advantages of ontologies is that they can be structured together,

with new, domain-specific ontologies being built underneath an existing, high-level

ontology. The biomedical field has many high-level ontologies already developed, such

as the Foundational Model of Anatomy (FMA) and the Systemized Nomenclature of

Medicine (SNOMED). Data from both the FMA and SNOMED were accessed using

the National Center for Biomedical Ontology (NCBO) BioPortal [18].

Developed by the University of Washington School of Medicine, the FMA is an

ontology focused on the domain of anatomy. The FMA contains approximately 75,000

classes with over 120,000 terms and 2.1 million relationship instances, making it one

of the largest ontologies in the field [9]. As a result of the scale of ontologies like

the FMA, it becomes clear how collaboration and reuse of knowledge is one of the

greatest assets to ontologies. It would likely be foolish to consider re-developing an

entire anatomical ontology (unless, for example, the FMA was inaccurate and one

wanted to recreate the ontology from scratch). For development of RTGO, the high-

level classifications for anatomical concepts were implemented in the same structure

as in the FMA.

While the FMA is a relatively specific ontology, SNOMED is wider-scale, cover-

ing many aspects of clinical medicine, including pathology and treatment concepts.

SNOMED was originally designed by the College of American Pathologists (CAP),

but it is now owned and maintained by the International Health Terminology Stan-

dards Development Organization (IHTSDO). With just under 400,000 classes defined,

SNOMED is a very comprehensive ontology [23]. Much of the clinical terminology

(i.e. blood pressure) and pathological terminology (i.e. malignant growth) in RTGO

was implemented using the same class hierarchy as SNOMED.

18

3.2 Concept Extraction from QUANTEC Papers

To accurately represent all the knowledge contained within the QUANTEC papers, a

very meticulous process of concept extraction was performed. Every concept (such as

bleeding, blood, blood cells, surgical procedures, injuries, etc) needed to be extracted

in order to model the knowledge correctly. Although techniques for automated con-

cept extraction using natural language processing are in development, these methods

are not foolproof yet [5]. In the future, concept extraction for ontologies like RTGO

could potentially make use of automated concept extraction software, but for the

current study, five or more manual passes were performed on each paper to extract

concepts contained within. As an example, a total of 351 concepts were extracted

from the QUANTEC paper on the bladder [27], and using the FMA and SNOMED

ontologies as a guide, the concepts were organized in outline format. The two high-

est levels in the hierarchy of concepts are shown in Figure 3.1. At its deepest, the

concepts currently in RTGO reach as deep as nine layers from the top level concepts.

Once the concepts were laid out on paper, they were added as classes to the ontol-

ogy. Figure 3.2 shows a portion of the structure of these top two levels in a graphical

representation of the ontology. To maintain clarity, only the classes in the ontology

were included in the figure, but dozens of relational properties interconnect the ma-

jority of these classes with each other. Although this process of concept extraction

was relatively time-consuming for the two QUANTEC papers, future extraction will

be significantly easier. Because of the reusability of existing terms in an ontology,

when reviewing a new paper, only concepts that were brand new would need to be

implemented. In the case of the QUANTEC papers, 351 concepts were extracted

from the paper on the bladder, while only 48 additional concepts were added to the

ontology to represent the knowledge contained within the paper on the rectum. It is

this elimination of repetition that provides ontologies with much greater scalability

19

Figure 3.1: Extraction of Concepts from QUANTEC Papers

compared to the current approach of literary publications.

3.3 Challenges Building the structure of RTGO

As introduced in Section 2.1, care must be taken to determine the level of abstrac-

tion necessary for the domain at hand. Although there is no one correct structure

20

Figure 3.2: Graphical Representation of Top Two Layers of RTGO

for determining the level of abstraction, one structure may represent the data more

easily than another. The primary consideration is to determine whether a concept

(i.e. stomach, publication, electromagnetism) should be represented as a class or an

individual.

One example of this decision for RTGO was determining how to represent a body

organ like a lung. As a class, Lung allows for other concepts to be associated with

it, such as: Emphysema occursIn Lung. As an individual, Lung could be an instan-

tiation of the generic class Organ. There is no universally correct answer to this

21

issue - depending on the needs and desired functionality of the ontology, Lung could

be implemented either as its own class, or as an instantiation of an organ. Similarly,

MalignantCell could be represented as either a sub-class of the Cell class, or it could

simply be an instantiation of the Cell class, with a descriptive property isMalignant

containing the boolean value of True. In general, concepts were added to RTGO as

classes whenever possible. Only when it was clear that no further sub-division of the

concept would likely be introduced was a concept added as an individual.

22

Chapter 4: RTGO GUI Plug-In Development

The Protege Ontology Editor and Knowledge Acquisition System was chosen as

the software to be used to develop RTGO and an associated GUI Plug-In. Protege is

a free, extensible, Java-based, open-source ontology editor developed at the Stanford

Center for Biomedical Informatics Research. Continued development on the Protege

project is supported by a grant from the United States National Library of Medicine

[24].

4.1 Protege Overview

Protege is a very widely-used application for developing ontologies, with an active

support community of over 176,000 users as of July, 2011. Being Java-based, Protege

is platform independent, allowing ontologies to be built and distributed regardless

of the operating system of the user [24]. Simply, Protege provides a graphical user

interface through which the user is able to develop an ontology. Figures 4.1 and 4.2

show the standard Protege version 4.1 interface, which was used for development1.

4.2 Developing Plug-Ins for Protege

The extensible nature of Protege allows any user to develop a plug-in for the pro-

gram. The Protege Programming Development Kit (PDK) provides documentation

and sample code for developers to utilize when developing plug-ins [21]. The Eclipse

IDE (version Helios, Service Release 1) was used for coding the RTGO plug-in. It is

1Previous versions of Protege were focused around frames, which allow for similar representation ofclasses, relationships, and individuals, but lack the inferencing capabilities of the Web OntologyLanguage (OWL). Current versions of Protege create files in OWL format, which brings reasoningcapabilities to the ontology being developed

23

Figure 4.1: Protege Interface

common for plug-ins to be developed for domain-specific ontologies, since the appli-

cability of the ontology rests upon providing quick access to the most usable aspects

of the ontology. In the case of RTGO, it was desirable to provide quick access to the

dose-volume relationships from previous studies. Running Protege with the RTGO

plug-in developed, a new tab was added to the main Protege interface, allowing the

user to activate the plug-in. This tab can be seen in Figure 4.3.

4.3 RTGO GUI Plug-In

Although hundreds of concepts need to be included in the ontology in order to accu-

rately model radiation treatment planning, the primary focus in treatment planning

studies is often to reach a dose-volume recommendation. In other words, the study

aims to recommend certain doses of radiation that should be applied to a particular

24

Figure 4.2: Protege Interface Editing Class Hierarchy of RTGO

Figure 4.3: Accessing the RTGO Plug-in from the Protege Interface

volume of the organ at hand. The GUI built on top of RTGO focuses directly on

these dose-volume relationships, allowing for dose-volume graphs to be displayed for

a particular organ.

The QUANTEC papers produce recommendations based on conclusions drawn

from multiple individual studies, but since one of the goals of RTGO is to improve

the capacity for specialization, dose-volume recommendations from individual studies

are plotted as well. Figure 4.4 demonstrates the RTGO GUI plug-in interface, with

a plot of the QUANTEC recommended dose-volume curve on the left, and a plot of

multiple dose-volume recommendations from individual studies on the right.

25

Figure 4.4: RTGO Plug-in Interface

4.4 Dose-Volume Relationships

As the interface of the RTGO GUI plug-in is centered around displaying dose-volume

relationships, determining how to represent “Volumes” and “Doses” in the ontology

is of pivotal importance. In general, the dose of a particular radiation treatment is

measured in the unit of grays (abbreviated: Gy). One Gy is defined as the absorption

of one joule of radiation by one kilogram of matter:

1 Gy =1 J

1 kg(4.1)

When a certain volume (or all) of an organ is exposed to radiation, this can be

viewed as a dose-volume relationship. A basic dose-volume plot can be constructed

relating the dose (in Gy) to the percent volume being exposed to the dose, as shown

in Figure 4.5.

Given that the RTGO interface is primarily designed to display these Dose-Volume

26

Figure 4.5: Dose-Volume Histogram for Radiation Effects on the Rectum [17]

Relationships to the user, careful consideration must be taken regarding the quan-

tities actually being graphed. Although it initially may not seem complicated, the

determination of the definition of organ volumes, radiation doses,and even side effects

is rather complex. Building a tool that displays graphs based around these concepts

requires an understanding of what each measurement or description truly represents.

4.4.1 Challenges Defining Organ Volume

Attempting to estimate the volume of an organ receiving a particular dose is far

from a trivial issue. At the outset, the mere definition of a particular organ may

not always be explicitly definable [17]. For example, the concept of an “esophagus”

might seem simple enough, often defined as the organ through which food passes

from the pharynx to the stomach. It is not necessarily defined, however, whether the

27

definition of the esophagus should include the esophageal walls. Along a similar line of

thinking, a representation of a stomach or rectum could be either with or without its

contents. These are uncertainties that, to a certain degree, have been answered over

many years in the development of the FMA and SNOMED, but they demonstrate

the complexities of concept extraction in developing an ontology.

Even when the physical definition of an organ has been established, other issues

arise as well. Many organs, such as the stomach, rectum, or lungs, are distensible,

meaning that they can swell when filling with liquid, solid, or gaseous contents. Thus,

even in the same individual many organs do not have a fixed volume. Rather, their

volumes can fluctuate based on contents, body position, physical exertion, or normal

bodily functions such as digestion. Some of these fluctuations are slow, during normal

body growth and development, but some are more rapid, such as the lungs inflating

and deflating with each breath [27].

Although these and other challenges are faced when determining an organ’s vol-

ume, these complications have been analyzed extensively [14, 17, 27] in order to reach

the closest possible approximations in data being reported. The specifics of reaching

these volume approximations extend beyond the scope of this paper, so the volume

data reported in the QUANTEC and other studies are taken to be sufficiently accu-

rate. For the development of RTGO, the issue of the definition of organ volume is

left up to the institutions carrying out the studies.

4.4.2 Challenges Defining Radiation Doses

Similar to defining an organ volume, many difficulties are encountered when deter-

mining the actual dose to which an organ is exposed. Although the electromagnetic

wave may be delivered at a particular dose, factors such as absorption, dissipation,

and dispersion of the energy begins immediately.

28

Very complex calculations are performed to analyze the precise geometric factors

in planning the radiation absorption. Determining the dose in a planned target volume

(PTV) for the treatment involves arranging radiation beams into 7 or more fields and

calculating the appropriate dose fractions to be applied by each field – applied over

the course of many days, a total dose of 70 Gy might be divided into fractions of 2.12

Gy [10].

Similarly, determining the “true dose” being applied to the tissue (DA) is an

extremely complex process. During the course of treatment, (DA) can change as both

the tumor and normal tissue are deformed by the radiation. Once entering the body

each type of tissue (i.e. skin, blood vessels, muscle etc) absorbs the energy at different

rates. Since the radiation may need to pass through multiple different types of tissue

before reaching the desired target organ, very complex calculations need to be made

to determine the actual dose reaching the organ [14].

As with volume definitions, dose determinations are approximated as closely as

possible, but are not absolutely precise. Again, these dose calculations are beyond

the scope of this paper, but they have been studied extensively [14] and are taken to

be sufficiently accurate.

4.4.3 Challenges Defining Side Effects of Radiation

The determination of how side effects should be represented in the Semantic Web is

yet another process that takes careful consideration. Working off Bentzen’s defini-

tions, endpoints can be defined as “health state characteristics that are used to assess

treatment outcome in a population of patients” [4]. These endpoints can be symp-

toms, signs, or actual measurements obtained from the patient. When dealing with

normal tissue effects, endpoints are side effects that affect, “health-related quality of

life” [4].

29

Understandably, however, there can be a wide range of severity for a particular

endpoint. The endpoint bleeding, for example, may be a simple drop of blood or a

loss of fatal amounts of blood. Due to the subjectivity of endpoints, representing the

varying levels of side effects can be difficult, but several standards organizations have

published guidelines for classifying endpoints.

4.4.4 CTCAE: Common Terminology Criteria for Adverse

Events

The Common Terminology Criteria for Adverse Events (CTCAE) is one such publi-

cation that is frequently used [17] for reporting clinical observations during studies.

Figure 4.6 shows part of the CTCAE publication, version 3.0. Five levels, or Grades,

are given for a specific adverse event, with 1 being the least severe and 5 being the

most severe (usually death).

The QUANTEC papers used in building RTGO presented side effect summaries

using CTCAE guidelines (typically, any effects classified as greater than or equal

to level 2 were included in statistics), but other similar publications for classifying

endpoints are in existence. The Radiation Therapy Oncology Group [17] and the Late

Effects of Normal Tissues-Subjective, Objective, Management and Analytic system

(LENT-SOMA) [27] have been published, but are not used as widely as CTCAE [17].

With the nature of the expandable ontology, however, RTGO has the capability to

include these, or other, endpoint classification guidelines if studies are produced with

results reported using a guideline other than CTCAE.

4.4.5 Acute vs. Late Endpoints

Even with the classification of endpoints into grade levels according to severity, signif-

icant consideration must be given to the time when the endpoint occurs. Endpoints

30

Figure 4.6: CTCAE Version 4.0 Classification of Gastrointestinal Adverse Events [7]

are generally divided into Acute Endpoints and Late Endpoints. The former refers

to an endpoint that occurs immediately after the treatment is administered, while

the latter refers to long-term effects, occurring months or years after the treatment.

Late endpoints are often the most difficult to obtain data about, because patients are

often no longer being actively treated for the original issue or are being treated by a

different physicians [13].

This aspect of late endpoints demonstrates another drawback to static, literary

publications. If a study has already been published, and a summary paper has already

been published based on the original study, the data in that summary paper is static.

With the dynamic nature of an ontology, however, new data can be entered into the

system, and new conclusions or inferences can be made immediately.

31

4.5 Features of RTGO

The RTGO Plug-in interface provides a GUI that queries data stored in the ontology.

Both QUANTEC summary data and individual study data are contained within the

ontology. Every study contains dose-volume data points to construct the correspond-

ing curve on the Dose-Volume histogram.

4.5.1 Filters

In order to meet the goal of modeling data in a manner conducive to specialization

(and given that there are many different factors that go into a radiation treatment

study), the ability to filter data according to specific parameters is crucial. The

implementation of search filters leads to providing a physician with relevant data,

assisting in the development of the best possible treatment plan. RTGO provides two

key filters: a date range filter and a study size filter.

Since new discoveries in the medical field might lead to some older studies being

less relevant, the papers returned from the query can be limited by their date of

publication. Similarly, if a physician wants to exclude small (or large) studies, the

number of patients in the study can also be filtered. These two filters demonstrate

the power of representing data in an ontology. A near-infinite number of filters could

be implemented, allowing for an even more powerful tool. Attempting to locate

specialized data relevant to the specific case at hand is rather burdensome when

navigating literary publications; however, with an ontological storage model, access

to fine-tuned data is nearly instantaneous.

32

4.5.2 Additional Study Details

While the dose-volume data is the most straightforward piece of data gleaned from a

radiation treatment study, there are many factors contained within the papers that

may be of use to the user. The Semantic Web model of an ontology facilitates access

to this information: each paper is represented as an instance of the class Publication.

All bibliographic information about the paper is also contained within the Publication

instances, including details about how the study was performed (represented as a

Study Protocol), containing data such as the number of fractions into which the ra-

diation dose was separated. Using relational properties (i.e. hasOutcome), the Study

Protocol can be linked to specific Endpoints observed during the study.

4.5.3 Direct Access to Abstracts and Full Papers

Although an ideal knowledge extraction process will represent every aspect contained

in the narrative text of a paper, there are times when the actual text of the publication

may need to be referenced - either solely the abstract or the publication in full. In

RTGO, when a search is performed based on parameters specified in the filters, a list

of links is displayed, allowing direct access to the both the abstract and full text of

the paper. Figure 4.7 demonstrates this feature.

33

Figure 4.7: Link to View Abstracts and Full Text of Publications

34

Chapter 5: Software Engineering Analysis

In the development of any computer software, certain evaluations should be per-

formed in order to ascertain how well the application meets the intended needs. While

there is no single methodology for evaluating a software’s correctness, efficiency, and

scalability, the International Organization for Standards (ISO) 9126 standard for soft-

ware quality evaluation is one of the most commonly used benchmarks for this task

[12]. The ISO 9126 standard outlines a variety of internal, external, and in-use quality

metrics to be evaluated, including (1) accuracy/functionality – whether or not the

data presented by the program is accurate and functions properly, (2) efficiency – how

efficiently the program meets the needs of the user, and (3) scalability/maintainability

– how feasible growth is within the program and the data.

The ISO 9126 standard has been used in the evaluation of a variety of software

designs [6, 1], and is recommended for a variety of reasons: (1) it is a very com-

prehensive standard, evaluating a wide variety of aspects of software; (2) it places

strong emphasis on the needs of users, using wording that is common and understood

by users, developers, and evaluators alike; (3) it provides for an objective analysis

of software quality; and (4) it establishes a reproducible process of evaluation [8].

Although there is always room for improvement, RTGO scores well in an evaluation

based on the ISO 9126 standard.

5.1 Software Engineering Dimensions

Using the ISO 9126 standard, three key dimensions were analyzed: accuracy, effi-

ciency, and scalability. Access to RTGO (both the structured ontology and the GUI

plug-in) was provided to medical physicists trained in radiation treatment planning

35

to analyze and evaluate the ontology in the three key dimensions. As the true focus

of this thesis is on the underlying ontology, the evaluation focuses primarily on how

well the ontology serves as a new model of data representation. The GUI plug-in pro-

vides easy access to the data contained within the ontology, but it was the ontological

representation of the data that was primarily being evaluated.

5.1.1 Accuracy

In the case of RTGO, or any biomedical ontology, accuracy of information being

represented is critical. The terminology and data must be modeled with no errors or

omissions. Stvilia defines accuracy as “The extent to which information is legitimate

or valid according to some stable reference source, such as a dictionary or set of

domain constraints and norms” [25]. For the radiation treatment planning domain,

accurate translation of concepts from the QUANTEC papers into concepts in the

ontology is the primary metric. Any concepts contained in RTGO must be valid

according to the source.

5.1.2 Efficiency

Defined as the “[r]elationship between the level of performance of the software and

the amount of resources used, under stated conditions, taking into account elements

such as the time response, or memory consumption” [8], efficiency is a metric in which

RTGO aims to provide substantial gains for the radiation treatment planning domain.

Although efficiency is often a metric used to quantify the algorithmic efficiency and/or

resource consumption of a machine, a slight adaptation to the definition was utilized

in the analysis of RTGO. While the algorithmic nature of the software is certainly

not insignificant, the goal of RTGO is to provide an increase in human efficiency of

data access. Although efficiency viewed in this light is not as concretely quantifiable

36

(with no static source as in accuracy measurement), evaluators were asked to rate the

ontological structure of RTGO based on the anticipated increase or decrease in the

amount of time required to access data using RTGO compared with their previous

clinical experiences referencing literary publications.

5.1.3 Scalability

Defined as, “[a]ttributes that bear on the effort needed to make specified modifica-

tions,” a data storage model is not useful if it cannot easily accommodate new data

being added over time. Due to the nature of the Semantic Web, this often-challenging

metric is one where ontologies can prove to be very scalable. As in the example of

the Presidents ontology maintained by an existing organization, the reuse of in-

formation aids substantially when it comes to scalability; previous information and

existing structures never have to be reinvented, and – assuming the ontology struc-

ture is designed appropriately – new information fits right into the model. Evaluators

were asked to rate the ontological structure of RTGO based on the anticipated ease

of adding additional studies, concepts, and data into the ontology.

5.2 ISO 9126 Analysis of RTGO

Similar to other ontology analysis studies [8], a simple evaluation of the ontology

was performed by individuals already knowledgeable about the design process and

implementation of an ontology. Four medical physicists experienced in radiation

treatment planning were used in the study. Each was given approximately 60 minutes

to study the structure and features of RTGO. Each evaluator was provided with a

form containing the definitions of each dimension, as quoted in Sections 5.1.1–5.1.3.

On this form, evaluators were asked to quantitatively rate the three software design

dimensions from 1 (worst) to 5 (best). In addition, evaluators were able to give

37

open-ended comments on each of the dimensions.

5.3 Software Engineering Analysis Results

The scores from the evaluators were collected and averaged. A summary of the

numeric scores from each evaluator is presented in Table 5.1. The results were also

plotted on a radar graph, which allows for easy visualization of each evaluator’s scores

compared to the possible range of values (1-5) and with respect to each other. Each of

the three axes on the graph represent the software quality dimensions being evaluated:

accuracy, efficiency, and scalability. Each axis begins at a rating of 1 at the origin

(lowest score) to a rating of 5 at the apex (highest score). The results of the evaluation

are presented Figure 5.1.

Table 5.1: ISO 9126 Evaluation of RTGOEvaluator 1 Evaluator 2 Evaluator 3 Evaluator 4 Average

Accuracy 5 5 5 4 4.75Efficiency 4 4 5 5 4.5Scalability 3 3 4 3 3.25

38

Figure 5.1: Results of ontology evaluation. Four series represent the individual eval-uator ratings. The fifth series represent the average of the ratings

39

Chapter 6: Conclusion and Future Work

The fact that medical study data is presently modeled in a manner that does not

allow for easy computation or specialization limits the practicality of using that data.

Pooling of the data in literature can be done manually, and calculations can then be

performed, but ontologies allow for the data to be represented in a calculable format.

The design of an ontology takes careful consideration, but many existing ontologies

are already well into development, providing the building blocks for domain-specific

ontologies to apply the powerful reasoning abilities of the Semantic Web principles

with minimal repetition of work. Collaborative datasets become increasingly powerful

and applicable to real-world situations, reaching far beyond the capabilities of static

literary text.

6.1 Discussion of Evaluation

The evaluation of RTGO indicates that overall, RTGO scores well in terms of ontol-

ogy design. The accuracy and efficiency of the ontology are both rated very strongly,

with an average of 4.75 and 4.5, respectively. The scalability of the ontology was

determined to be slightly weaker, with an average rating of 3.25. With this develop-

ment of RTGO being the very first attempt at constructing the ontology, these results

are not surprising. The accuracy of concepts reflects the successful transfer of terms

and knowledge in the literary publications. The evaluation of efficiency indicates that

the tool does provide strong potential for increasing the efficiency of accessing data

in radiation treatment planning. The scalability of the ontology, however, indicates

room for improvement, which is unsurprising for any ontology in the initial stages of

development.

40

As with any software, the development process is not a one-time, start-to-finish

method. Successful software products must constantly be reviewed to determine how

well they meet the intended goals. Given that the current version of RTGO was

the initial attempt at constructing the ontology structure and the ontology design

challenges discussed in Section 3.3, a few changes are proposed to improve RTGO.

Currently, toxicity and endpoints are represented in RTGO as individuals. When uti-

lizing these endpoints, however, it became clear that with the various endpoint classi-

fication publications (such as CTCAE and LENT-SOMA discussed in Section 4.4.4),

in addition to the severity grades, the individuals started to become a bit unwieldy.

As noted by the evaluators, these toxicity definitions contributed to a lower rating

for scalability in RTGO’s current form. Reworking the representation of medical

endpoints by providing additional sub-classes and reducing the number of individu-

als required would improve the scalability. With some slight adjustments like these,

future versions of RTGO are expected to be more scalable, allowing for all types of

data (additional studies, organs, endpoint classifications, etc) to be added easily and

logically.

Human evaluation of software, in general, is inherently somewhat of a subjective

process. The first component, accuracy, is relatively objective; concepts should be

spelled correctly and hierarchically accurate. At times, however, there is not always

one correct – or accurate – way to represent certain concepts. Blood, for example,

can be modeled as a bodily fluid, or as a collection of several types of cells. Since

ontologies allow for multiple inheritance, this allows for blood to be modeled in both

approaches. To have a complete, accurate ontology, all existing concepts should be

related accurately with respect to each other.

In the future as a shelf-ready product, objective evaluations of the actual efficiency

of RTGO could be carried out by a precise comparison of the time required to reference

41

data using RTGO versus literary publications. In the current initial stage of RTGO,

however, a degree of subjectivity and approximation is necessary. Given that the

evaluators were all experienced in radiation treatment planning, they could contribute

objective experience and knowledge of the current approach to the development of

a treatment plan. Although RTGO does not yet contain a full store of concepts

and study data, the basic ontology provided a foundation upon which the evaluators

could carefully – although subjectively – consider how efficient the tool would be in

the real-world environment.

The scalability of RTGO was also a facet in which both objectivity and subjec-

tivity were necessary. Objectively, if a concept such as a knee was modeled as an

individual, it would need to be instantiated every time it was used. Clearly, this is

not a scalable implementation for the concept of a knee, but the degree of decreased

scalability resulting from poor implementation decisions was certainly a subjective

determination.

Overall, although the subjectivity of the evaluation is a factor, additional evalu-

ations with even more evaluators would provide for even better data to account for

variance in opinions.

6.2 Ontologies in Other Disciplines

The development and analysis of RTGO has shown that ontological representation of

knowledge can provide many benefits to traditional narrative text publications. With

the existence of anatomical and clinical ontologies in the FMA and SNOMED, the

building blocks for medical ontologies are already in place. While RTGO focuses on

storing data pertaining specifically to radiation treatment planning, it is not hard to

see how a similar ontology could be built for similar domains, especially those with

limited definitive treatment protocols like dementia or Alzheimer’s disease. Ontologies

42

can help structure research data for these diseases that are not well understood with

a model that is more “disciplined” [20].

6.3 Future Work

RTGO in its current form lays the groundwork for even more full-featured ontology-

based systems to be developed, both in the field of radiation treatment planning and

beyond. With additional search filters and more direct integration with external data,

the tool can revolutionize the way in which physicians access radiation treatment data.

6.3.1 Additional Filters

The three filters implemented with this version of RTGO (organ, date range, and

study size), provide a proof-of-concept model that knowledge representation within

an ontology provides the radiation treatment planning domain with a structure that

allows for data to be calculable as well as specializable. Continued development is

in progress to expand the plug-in with additional filters and search tools. Additional

filters will allow more fine-tuned search to be performed and will provide even better

accessibility to desired data. The ability for users to access the most relevant data

allows for more efficient and accurate treatment planning.

In addition, once the RTGO model is in place, the way in which future radiation

treatment studies are conducted can be adjusted: because of this new format of data

storage, it encourages studies to be more specialized. Previously, there was less of

an impetus for researchers to focus on the specifics (i.e. age, gender, ethnicity, etc)

of the patients involved. With a model that not only allows for, but thrives off

of specialized information, we believe that RTGO will encourage improvement in the

documentation of this type of information. The relational aspect of an ontology brings

all the data closer together, allowing for the knowledge base to become a better and

43

better resource over time.

6.3.2 Integration With Journals for Full Text Access

In the current design of RTGO, the access to full-text copies of literary resources is

implemented relatively simplistically. The PDF files of the publications reside locally

on the machine running the RTGO plug-in, and are simply accessed through the

local directory, which is known to RTGO. In the future, however, implementation

with external journals would not be difficult. Since each publication is represented

as an instance, an additional property could be added containing the URL for the

location of the article. The actual specifics of subscription verification and authorized

access to the journal databases would require some careful programming, but from

the perspective of RTGO, the plug-in offers a tool that is ready for the bridge to be

developed.

6.3.3 Integration With Existing High Level Ontologies

As discussed in Section 3.1.1, the FMA and SNOMED ontologies were used as ref-

erences in designing the roots of the structure for RTGO. For simplicity, the classes

in RTGO were extracted from the QUANTEC paper (i.e. the classes necessary for

the specific RTGO domain) and the ontology structure was built by manually adding

these concepts. With ontologies like the FMA and SNOMED already in existence,

however, using these top-level ontologies as building blocks can expand the range of

RTGO immensely.

A successful integration between the ontologies would provide RTGO with full-

access to the thousands of concepts already defined in the FMA and SNOMED.

Although the actual process of ontology integration and combination is beyond the

scope of RTGO, this kind of integration and alignment process will allow RTGO to

44

model an incredible amount of knowledge as a result of the advantages provided by

the Semantic Web backbone.

6.4 Final Words

Based on the principles of the Semantic Web, ontologies provide a model for knowl-

edge representation that introduces many improvements to the current model. The

reasoning, classification, and expansion capabilities allow for data to become more

useable, strongly supporting collaboration and reuse. Ontologies allow for knowledge

to be captured in a way that allows for inferences to be made computationally as

opposed to manually. In addition, access to specialized data is vastly improved when

data is stored in an ontology compared to static literary publications. In the medical

community in particular, the amount of knowledge is incredibly expansive, but the

way that the knowledge is accessed has barely changed over centuries. With the power

of computing available today, it is time for medical research data to become smarter,

more accessible, and more powerful. RTGO harnesses the power of the Semantic Web

to bring this power to the radiation treatment planning domain, and the evaluation

shows great potential for the improvement of RTGO in the future.

45

Bibliography

[1] Al-Kilidar, H., Cox, K., Kitchenham, B. The use and usefulness of the ISO/IEC

9126 quality standard. In Proceedings of the International Symposium on Em-

pirical Software Engineering, November, 2005, Noosa Heads, Australia.

[2] Allenmang, D. and J. Hendler. Semantic Web for the Working Ontologist: Ef-

fective Modeling in RDFS and OWL. Morgan Kaufmann Publishing. 2008.

[3] Bentzen, SM., Constine, LS., Deasy, JO., et al. Quantitative Analyses of Nor-

mal Tissue Effects in the Clinic (QUANTEC): An Introduction to the Scientific

Issues. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Supplement, pp.

S3S9, 2010.

[4] Bentzen, SM., Parliament, M., Deasy, JO., et al. Biomarkers and surrogate end-

points for normal-tissue effects of radiation therapy: the importance of dose-

volume effects. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Supple-

ment, pp. S145S150, 2010.

[5] Buitelaar, P., Cimiano, P., and Magnini, B. Ontology Learning from Text: An

Overview. In Proceedings of the European Conference on Articial Intelligence

and the International Conference on Knowledge Engineering and Management.

2004.

[6] Chua, BB., Dyson, LE. Applying the ISO 9126 model to the evaluation of an

e-learing system. In Proceedings of the 21st ASCILITE Conference, December

2004, Perth, Australia.

46

[7] Common Terminology Criteria for Adverse Events.

http://evs.nci.nih.gov/ftp1/CTCAE/CTCAE-4.03-2010-06-14-QuickReference-

8.5x11.pdf. 2010.

[8] Fernandez-Breis, JT., Aranguren, ME., and Stevens, R. Quality evaluation

framework for bio-ontologies. In Nature Proceedings, July, 2009.

[9] Foundational Model of Anatomy. http://sig.biostr.washington.edu/projects/fm

/AboutFM.html. 2011.

[10] Hunt, MA., Jackson, A., Narayana, A., et al. Geometric factors influencing dosi-

metric sparing of the parotid glands using IMRT. Int. J. Radiation Oncology Biol.

Phys., Vol. 66, No. 1, pp. 296304, 2006.

[11] i2c Why Ontologies? The Centre for Informa-

tion Systems in Infrastructure and Construction.

http://i2c.engineering.utoronto.ca/I2C/Data/Ontology/WhyOntologies.aspx.

2005.

[12] ISO/IEC 9126. http://en.wikipedia.org/wiki/ISO/IEC-9126.

[13] Jackson, A., Marks, LB., Bentzen, SM. et al. The lessons of QUANTEC: Rec-

ommendations for reporting and gathering data on dose-volume dependencies of

treatment outcome. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Sup-

plement, pp. S155S160, 2010.

[14] Jaffray, DA., Lindsay, PE., Brock, KK., et al. Accurate accumulation of dose for

improved understanding of radiation effects in normal tissue. Int. J. Radiation

Oncology Biol. Phys., Vol. 76, No. 3, Supplement, pp. S135S139, 2010.

47

[15] Jeraj, R., Cao, Y., Haken, RK., et al. Imaging for assessment of radiation-

induced normal tissue effects. Int. J. Radiation Oncology Biol. Phys., Vol. 76,

No. 3, Supplement, pp. S140S144, 2010.

[16] Marks, LB., Yorke ED., Jackson, A. at al. Use of normal tissue complication

probability models in the clinic. Int. J. Radiation Oncology Biol. Phys., Vol. 76,

No. 3, Supplement, pp. S10S19, 2010.

[17] Michalski, J., Gay, H, Jackson, A, et al. Radiation Dose-Volume Effects in

Radiation-Induced Rectal Injury. Int. J. Radiation Oncology Biol. Phys., Vol.

76, No. 3, Supplement, pp. S123S129, 2010.

[18] National Center for Biomedical Ontology BioPortal.

http://bioportal.bioontology.org/. 2011.

[19] Palta, M., and R. Lee. The development of oncology treatment guidelines: an

analysis of the National Guidelines Clearinghouse. Practical Radiation Oncology

(2011) 1, 3337.

[20] Perl, Y., Geller, J., Gu, H. Identify a forest hierarchy in an OODB specialization

hierarchy satisfying disciplined modeling. In Proceedings COOPIS ’96 Proceed-

ings of the First IFCIS International Conference on Cooperative Information

Systems 1996.

[21] Protege developer documentation. http://protege.stanford.edu/doc/dev.html

2011.

[22] Rosse, C., Mejino, JL. A reference ontology for biomedical informatics: the Foun-

dational Model of Anatomy. Journal of Biomedical Informatics 36 (2003) 478500.

[23] SNOMED Clinical Terms Summary. http://bioportal.bioontology.org/ontologies/42122.

2011

48

[24] Stanford Center for Biomedical Informatics Research: What is Protege?.

http://protege.stanford.edu/overview/index.html. 2011.

[25] Stvilia, B. A model for ontology quality evaluation. In First Monday, December,

2007.

[26] Temal, L., Dojat, M., Kassel, G., Gibaud, B. Towards an ontology for sharing

medical images and regions of interest in neuroimaging. Journal of Biomedical

Informatics 41 (2008) 766778, 2007.

[27] Viswanathan, AN., Yorke, ED., Marks, LB., et al. Radiation Dose-Volume Effects

of the Urinary Bladder. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3,

Supplement, pp. S116S122, 2010.

49

Vita

Thomas M. Minta

Department of Computer Science

Wake Forest University

Post Office Box 7311

Winston-Salem, NC 27109

EducationM.S. Computer Science, Wake Forest University, 2011.

B.S. Computer Science, Wake Forest University, 2008.

Employment

Technology Analyst, University Advancement, Wake Forest University, May 2011-Present.

Research

Minta, T.M., Fulp, E.W., Ge, Y., Turkett, W.H. (2011) Ontological Representation of

Radiation Treatment Data.

Conference and Seminar PresentationsA Comparison of Static to Biologically Modeled Intrusion Detection Systems, Fukuoka

Institute of Technology, November 6, 2010.

50

Documents

ONTOLOGICAL REPRESENTATION OF RADIATION TREATMENT