Upload
buidien
View
215
Download
0
Embed Size (px)
Citation preview
ONTOLOGICAL REPRESENTATION OF RADIATION TREATMENT DATA
BY
THOMAS M. MINTA
A Thesis Submitted to the Graduate Faculty of
WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES
in Partial Fulfillment of the Requirements
for the Degree of
MASTER OF SCIENCE
Computer Science
August 2011
Winston-Salem, North Carolina
Approved By:
Yaorong Ge, Ph.D., Advisor
Errin W. Fulp, Ph.D., Chair
William H. Turkett, Ph.D.
Acknowledgments
Dr. Yaorong Ge, Dr. Errin Fulp, and Dr. William Turkett,
Your support, advice, and encouragement over the years has been truly remarkable.Year after year, you’ve been incredible references for all that I’ve needed.
Wake Forest University Department of Computer Science
I couldn’t imagine a more supportive department, both faculty and students alike. Isincerely thank everyone who has been involved in my studies in one way or another.
My friends and family
I owe a tremendous amount to all who have provided me guidance and directionnot only in my studies, but in life.
ii
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Radiation Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Current Approach to Radiation Treatment Planning . . . . . . . . . . 2
1.3 The Need for a New Treatment Planning Approach . . . . . . . . . . 3
Chapter 2 Semantic Web and Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Reasoning and Inferring . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 RDF: Resource Description Framework . . . . . . . . . . . . . . . . . 11
2.4 OWL: Web Ontology Language . . . . . . . . . . . . . . . . . . . . . 14
2.5 Applicability of Semantic Web Principles for Treatment Planning . . 15
2.6 Comparing Ontologies to Relational Databases . . . . . . . . . . . . . 15
Chapter 3 Radiation Treatment Guideline Ontology. . . . . . . . . . . . . . . . 17
3.1 Radiation Treatment Resources . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Existing High-Level Ontologies . . . . . . . . . . . . . . . . . 18
3.2 Concept Extraction from QUANTEC Papers . . . . . . . . . . . . . . 19
3.3 Challenges Building the structure of RTGO . . . . . . . . . . . . . . 20
Chapter 4 RTGO GUI Plug-In Development . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Protege Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Developing Plug-Ins for Protege . . . . . . . . . . . . . . . . . . . . . 23
4.3 RTGO GUI Plug-In . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Dose-Volume Relationships . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4.1 Challenges Defining Organ Volume . . . . . . . . . . . . . . . 27
4.4.2 Challenges Defining Radiation Doses . . . . . . . . . . . . . . 28
4.4.3 Challenges Defining Side Effects of Radiation . . . . . . . . . 29
4.4.4 CTCAE: Common Terminology Criteria for Adverse Events . 30
iii
4.4.5 Acute vs. Late Endpoints . . . . . . . . . . . . . . . . . . . . 30
4.5 Features of RTGO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5.2 Additional Study Details . . . . . . . . . . . . . . . . . . . . . 33
4.5.3 Direct Access to Abstracts and Full Papers . . . . . . . . . . . 33
Chapter 5 Software Engineering Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1 Software Engineering Dimensions . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 ISO 9126 Analysis of RTGO . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Software Engineering Analysis Results . . . . . . . . . . . . . . . . . 38
Chapter 6 Conclusion and Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1 Discussion of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Ontologies in Other Disciplines . . . . . . . . . . . . . . . . . . . . . 42
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.1 Additional Filters . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3.2 Integration With Journals for Full Text Access . . . . . . . . . 44
6.3.3 Integration With Existing High Level Ontologies . . . . . . . . 44
6.4 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
iv
List of Figures
2.1 Basic Family Member Ontology Structure . . . . . . . . . . . . . . . 7
2.2 More Detailed Family Member Ontology Structure . . . . . . . . . . 8
2.3 Family Member Ontology Structure with Relationships . . . . . . . . 9
2.4 Inferred Structure of Family Member Ontology . . . . . . . . . . . . . 10
3.1 Extraction of Concepts from QUANTEC Papers . . . . . . . . . . . . 20
3.2 Graphical Representation of Top Two Layers of RTGO . . . . . . . . 21
4.1 Protege Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Protege Interface Editing Class Hierarchy of RTGO . . . . . . . . . . 25
4.3 Accessing the RTGO Plug-in from the Protege Interface . . . . . . . 25
4.4 RTGO Plug-in Interface . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Dose-Volume Histogram for Radiation Effects on the Rectum [17] . . 27
4.6 CTCAE Version 4.0 Classification of Gastrointestinal Adverse Events [7] 31
4.7 Link to View Abstracts and Full Text of Publications . . . . . . . . . 34
5.1 Results of ontology evaluation. Four series represent the individualevaluator ratings. The fifth series represent the average of the ratings 39
v
Abstract
Thomas M. Minta
Radiation treatment, or the use of powerful electromagnetic waves to treat malignantgrowths in the body, has become very common in the treatment of cancer patients.The dangers of radiation, however, can be catastrophic if not utilized with the utmostcaution; thus, meticulous treatment plans are absolutely essential. Currently, theprimary resources available to physicians for planning the specific doses of radiationto be prescribed are primarily found in published literature of previous studies. Asmerely textual resources, however, these publications do not lend themselves well toany sort of reasoning, computation, or specialization. Consequently, the knowledgecontained within these papers is not being captured in the most usable format. Toremedy this, a different approach to the extraction of this knowledge is necessary,coupled with a new model for representation of the knowledge in order to provide fordata that is both more efficient and accessible.
An ontology, which can be thought of as a class-based system that builds a web ofconcepts interconnected by relational properties, lends itself well to this purpose; thephysical (and non-physical) entities involved with radiation treatment planning existas classes, and relationships between these classes are defined by properties. Since anontology is similar to a database, storing data and relationships between the data, itprovides the ability for a constantly growing source of knowledge. Rather than static,independent sources of data, new data is entered into an existing ontology, whichimmediately can perform calculations and inferencing on the new data. This resultsin a dynamic knowledge representation model that can continually provide the mostrelevant data possible. To demonstrate and explore the benefits of an ontologicalrepresentation of knowledge, the Radiation Treatment Guideline Ontology (RTGO)takes advantage of this new approach. RTGO shows how the knowledge containedwithin the text of existing publications can be modeled in a way that allows for bothcomputation and specialization when utilized. Using established software engineeringcriteria, we analyze the correctness, efficiency, and scalability of this approach. Inaddition, we explore the potential applicability of similar ontological knowledge rep-resentation in other disciplines. Ultimately, RTGO shows that building an accurate,efficient, and scalable ontology to represent the knowledge contained within radiationtreatment guideline papers provides access to the data in ways that were previouslymuch more difficult, if not impossible.
vi
Chapter 1: Introduction
The power of computation has made truly remarkable gains in the past half-
century. Computer systems today are capable of calculating inordinate amounts of
numerical data at lightning fast speeds. From credit card transaction processing
to airline reservation systems, our world has developed capabilities that bring an
incredible amount of data to our fingertips. When it comes to the storage and use of
non-numeric knowledge, however, progress is not nearly as pronounced. For example,
we have computer systems that can process millions of medical records to analyze any
number of statistics in a patient’s health history, but when it comes to “calculating”
how that history translates to subsequent knowledge, such as a diagnostic plan, our
current abilities are not nearly as accomplished.
1.1 Radiation Treatments
The use of extremely strong waves of energy, commonly known as radiation, to treat
cancerous growths has revolutionized medicine. The ultimate goal of using radiation
is to destroy malignant cells, which are defined as cells that replicate with no regard to
responses from surrounding cells to suspend replication. While normal cells recognize
these signals to cease replication coming from other nearby cells, malignant cells
ignore the signals and can grow into cancerous masses. Radiation is capable of killing
these cells that grow uninhibitedly, but it also is equally as capable of killing normally
functioning cells, such as cells in vital organs like the heart or lungs.
External beam radiation therapy (EBRT), which was first introduced in the 1950s,
utilizes beams of radiation that can be aimed at specific areas of the body [3]. As
a result of the power and potential dangers of radiation treatment, meticulous plan-
1
ning of the implementation of radiation treatments is absolutely critical. Too much
exposure of any cell to radiation can kill it, but determining how much is too much
depends on a myriad of factors.
1.2 Current Approach to Radiation Treatment Plan-
ning
In order to develop appropriate treatment plans, radiation oncologists utilize a num-
ber of different resources, such as clinical practice guidelines, peer-reviewed research
studies, and medical imagery like MRIs or CT Scans [19]. Cells that are functioning
properly (i.e. not part of a malignant growth) make up normal tissue; damage to
this normal tissue should ideally be minimized. With all the external resources, the
physician’s goal is to develop a treatment plan that will minimize these normal tissue
effects, while maximizing damage to the malignant cells. Unfortunately, achieving
this goal is complicated by the fact that different types of tissue in the body (i.e.
muscle tissue, lung tissue, or bladder tissue) react very differently to radiation [4].
As a result of the complexities involved with radiation treatment, the development
of a successful radiation treatment plan must take into consideration all types of
tissue surrounding the cancerous growth. Many studies on radiation treatment effects
have been published (examples include [16, 17, 27]), and these publications are very
beneficial to physicians in treatment planning. Looking forward, however, the data
could be harnessed in a much more useful format. Two big drawbacks of the data in
its present form are (1) the lack of computability – support for automated processes
of reasoning, manipulation, and analysis and (2) the difficulty of specialization – the
ability to directly apply existing data to specific cases.
Publications on normal tissue effects of radiation treatment require significant
time to accurately conduct a study to produce the data – often 5 to 7 years or more
2
[17]. The results of these studies are commonly presented as recommended doses.
At this point, however, the quantitative analysis does not progress much further.
Once the paper has been published, it is used as a reference, but it is a static piece of
information. As a result, this data contained within literature, being represented only
in narrative text, does not lend itself well to computability; any type of comparing,
contrasting, inferring, or drawing conclusions on the data is a manual process.
In addition to the lack of computability, existing data does not allow for easy
specialization. If a patient, for example, is elderly with emphysema, his lung tissue
will likely experience very different effects than a young, healthy patient. Of course,
physicians are not blindly following generalizations, and they are fully capable of
making adjustments based on experience and other sources of knowledge. Modeling
the data in a manner that allows for more detailed specialization, however, would
help tailor treatments to the individual characteristics of the patient at hand.
1.3 The Need for a New Treatment Planning Ap-
proach
A simple solution might appear to make a push for more literature to be published
with more specific cases, allowing for a broader reference base, but published litera-
ture in its very nature can be cumbersome and is not always accessible. Taking the
approach of a broader reference base naturally leads to the physician needing to sift
through more papers in an attempt to find data relevant to the case at hand. Referring
to any discipline, Allenmang describes the desire to make data “smarter,” meaning
that its representation does not just exist to be referred to later and manually used for
calculations [2]. Rather, the data should be automatically available to other sources
of information and provide the ability for intelligent conclusions to be drawn, without
the need for a person to consciously seek out and apply the data. In order for this to
3
occur, a new model is needed for storing the data [2]. In the medical field, the neces-
sity for “disciplined modeling” of concepts [20] has become increasingly pronounced
[22]. By disciplined, we are referring to capturing the data in a more structured, scal-
able, and accessible format as opposed to independent publication after publication
containing redundant information.
In order to investigate the application of a more disciplined model to radiation
treatment planning, this thesis explores a new approach – an ontological represen-
tation of the data. Since textual representation of information leads itself to to
ambiguity in terminology (i.e. “George Bush” could refer to several different indi-
viduals), it is necessary to explicitly clarify what the term at hand means. Through
the exploration of an ontological representation of data, this thesis demonstrates how
building an accurate, efficient, and scalable ontology to represent the knowledge con-
tained within radiation treatment guideline papers provides access to the data in ways
that were previously much more difficult, if not impossible. Chapter 2 will introduce
ontologies, the proposed knowledge structure in which radiation treatment data can
be stored and discusses the benefits that this structure provides. Chapter 3 presents
the Radiation Treatment Guideline Ontology (RTGO) that was built to model the
data at hand. Chapter 4 introduces the Protege software used for development and
the interface built on top of the ontology. Chapter 5 provides analysis of the tool
developed, and Chapter 6 offers concluding remarks on RTGO and where the tool
can lead in the future.
4
Chapter 2: Semantic Web and Ontologies
As the name implies, the Semantic Web is a developing technology focused on
representing knowledge in a connected web, with the construction focusing on the
“semantics,” or meaning, of the knowledge. By storing information in this type of
web, based on a structure that reflects the actual entity in the world being represented,
relationships between the data can be better understood [2]. Using the Semantic
Web approach, the needs of computability and specialization of data can be much
better achieved. In the field of radiation treatment planning, this allows previous
publications to serve as calculable sources of data. Ideally, with more useful sources
of data, better treatment planning will follow. Although the semantic web approach
may initially seem a bit abstract, it is much more easily understood when considering
real-world example situations.
2.1 The Semantic Web
In a conversation, or piece of literature, the name “Thomas Jefferson” might likely
refer to the third president of the United States of America. On the other hand,
it could also refer to the American Dixieland jazz trumpeter born in 1920, of the
same name. While existing models of knowledge representation (specifically, the
narrative text of published literature) lend themselves to this sort of ambiguity, the
Semantic Web provides a framework for linking a term, such as “Thomas Jefferson,”
to the actual entity in the world to which the term refers. By providing relationships
between entities (such as having a certain ID number, or being married to another
individual, or being located at a certain location), the Semantic Web’s representation
of data allows for inferences to be drawn. In other words, the Semantic Web helps
5
model data in such a way that it can be calculable.
The structure of the Semantic Web’s representation of data can be thought of
as similar to an object-oriented programming approach. Objects in the world are
represented as a class, such as a Person or a President. Classes can be instantiated
as individuals, such as Thomas Jefferson. Properties can be defined to provide rela-
tionships between classes or individuals. One key aspect, however, is the support for
multiple inheritance - an individual may be a member of multiple classes. The indi-
vidual, Thomas Jefferson can be defined as a President, as well as a Husband and
a Father. By saying that Thomas Jefferson is a member of the class Husband the
individual can automatically possess the relational property, hasSpouse. Likewise,
since Thomas Jefferson is also a Father, it is known that he has at least one child
(assuming that having a child is our definition of a father). This approach begins
to unlock the true power of the Semantic Web structure, in that class membership
becomes akin to set theory.
When developing an ontology, however, it is important to consider the level of
abstraction necessary to meet the needs of the task at hand. Implementing too few
concepts as classes results in repetition of information in individuals. For example,
if the most detailed class implemented was a Computer, individuals in the ontol-
ogy could include DellLaptop, HPLaptop, MacintoshLaptop as well as DellDesktop,
HPDesktop, and MacintoshDesktop. The repetition in these individuals could have
been solved by creating the subclasses Laptop and Desktop. Implementing too many
classes, however, does not allow for the ontology’s data set (i.e. number of individu-
als) to grow without modifying the class structure. If separate classes were created
for DellLaptop, HPLaptop, MacintoshLaptop, etc. then the class structure would
need to be changed for a ToshibaLaptop to be represented, as opposed to a simple
instantiation of Laptop with a property to identify the manufacturer. As a result,
6
the Semantic Web provides for the ability to make an efficient and scalable knowledge
structure, but it is crucial to be very cognizant of the level of abstraction necessary.
2.2 Reasoning and Inferring
Once data is represented by sets, the possibility of applying reasoning to the data
allows the previously static data to become computable. By “computable,” we mean
that automatic reasoning and drawing of inferences becomes possible. The funda-
mentals of the Semantic Web can be turned into a concrete knowledge representation
called an ontology. As an example, an ontology can be constructed to represent the
members of a family. For a basic Family Member Ontology, a top-level class of Person
is defined. The basic connection between a class and its corresponding subclasses is
the relationship is-a. To represent a family, a FamilyMember can be created as a
sub-class of Person, because every FamilyMember “is a” Person. The FamilyMember
class can contain the subclasses Parent and Child, as shown in Figure 2.1 (note that
all concepts in an ontology automatically belong to the highest level class of Thing).
Figure 2.1: Basic Family Member Ontology Structure
Next, to add additional detail to the ontology, a FamilyMember could be identified
more specifically as a Father, Mother, Son, or Daughter. With these more specific
classes, the concept of gender is introduced. Figure 2.2 shows an expanded ontology,
including these new concepts in addition to the previous generic concepts.
7
Figure 2.2: More Detailed Family Member Ontology Structure
It becomes clear, however, that Father and Mother really belong as sub-classes of
Parent, and that Son and Daughter belonged as sub-classes of Child. Although we
could manually fix this in an example this trivial, allowing the ontology to perform
reasoning on this trivial example demonstrates the possibilities for more complex
ontologies.
In order for the the new structure of the ontology to be inferred, relationships must
be defined between classes. To provide relationships between the family members, the
properties isParentOf and isChild of can be defined. Naturally, a Child isChildOf
a certain Parent and vice versa. The following definitions can be established in the
ontology:
• A Child isChildOf some Parent
8
• A Parent isParentOf some Child
• A Father isParentOf some Child and hasGender Male
• A Mother isParentOf some Child and hasGender Female
• A Son isChildOf some Parent and hasGender Male
• A Daughter isChildOf some Parent and hasGender Female
With all these relationships defined, the ontology view begins to grow rather
complex rather quickly, as seen in Figure 2.3
Figure 2.3: Family Member Ontology Structure with Relationships
Once the asserted relationships have been defined in the ontology, reasoning can
be performed to infer additional relationships. Figure 2.4 shows the new inferred
structure of the ontology (without the isChildOf and isParentOf relationships, for
clarity).
9
Figure 2.4: Inferred Structure of Family Member Ontology
With its reasoning capabilities, the ontology inferred the correct classification
of Son and Daughter each as a Child, as well as Father and Mother each as a
Parent. As discussed above, it was likely clear from the beginning that Father and
Mother belonged as sub-classes of Parent, and that Son and Daughter belonged
as sub-classes of Child. To demonstrate the reasoning capabilities however, the
structure was created naively (with Father, Mother, Son, and Daughter simply as
generic FamilyMembers), allowing the ontology to infer the appropriate structure for
us. In a more complicated setting, however, such as determining the appropriate
representation of a person’s “stomach contents” within an anatomical class structure,
these inferencing capabilities makes the model extremely powerful.
The power of ontologies goes far beyond simply inferring class structure (i.e. in-
ferring that Son and Daughter are actually subclasses of Child). Given that classes
in an ontology are akin to mathematical sets, properties such as transitivity, reflexiv-
ity, or inversion can be utilized to perform additional computation. For example, an
individual William might be connected to another individual Paul by the property
10
isSonOf: William isSonOf Paul. The property isSonOf can be identified as the
inverse of another property, isParentOf. Thus, by stating, William isSonOf Paul,
it can automatically be inferred that Paul isParentOf William. Furthermore, if
Paul also possessed the property hasGender Male, we can go beyond knowing that
Paul isParentOf William; the additional gender property allows us to infer that
Paul isFatherOf William, simply by stating (1) that William is Paul’s child, and
(2) Paul is a male. Additionally, it could be inferred that William is also a Male
given that he is a son, and being a male, we can know that he has two distinct (X
and Y) sex chromosomes, and so on.
Applying these types of inference capabilities to the radiation treatment planning
domain makes it possible to infer relationships that may not have previously been
discovered. The transitivity property, for example, can allow for new correlations to
be discovered as a result of representing the knowledge in a Semantic Web format.
For example, an elevated white blood cell count might never have been previously
considered related to a particular disease. While it is certainly known that correlation
does not always prove causation – and this thesis is not presented as a method of
blindly applying the property of transitivity – this new model provides the framework
for this type of analysis to begin to be investigated.
2.3 RDF: Resource Description Framework
In the full scope of the Internet, the World Wide Web Consortium (W3C) has agreed
on various specifications for implementing the Semantic Web ideals. These specifica-
tions are contained within a framework known as the Resource Description Framework
(RDF). RDF has become the accepted framework standard, which defines Uniform
Resource Identifiers (URIs) to uniquely identify a certain concept or term. For URIs
to eliminate the ambiguity of basic text, a reference is not made to “Thomas Jeffer-
11
son.” Rather, RDF borrows from existing web technology; the syntax of a URI is
very similar to that of a URL [2]. For example, a URI might exist as
http://www.organization-a.org/Presidents.owl#ThomasJefferson
In the preceding example, the file “Presidents.owl” might be consolidated informa-
tion about Presidents published as an ontology by Organization A. If the information
in this ontology was found to be useful, accurate, or beneficial, Organization B could
“link” to this source. Unlike traditional hyperlinks to URLs found on the current
World Wide Web, however, the link is not a destination that must be visited in order
to access its data. Rather, the link can be thought of a a process of embedding the
data in the active publication. Thus, Organization B has direct access to the infor-
mation published by Organization A. If Organization A discovers more data about
a President and loads it into their source, Organization B’s page can automatically
reflect those changes, since the data source is linked. Of course, the issue of trust
naturally becomes an issue. Organization B must trust Organization A to maintain a
reliable and accurate source of information. This trust issue, however, is no different
from the current World Wide Web. Any web site is free to link to any other web
site, and it is the responsibility of the originating site to determine the validity of the
destination site. Understandably, the National Institute of Health would likely not
link to a middle school student’s article on cancer as a reputable source, although the
NIH is certainly free to do so. The Semantic Web allows for the same freedom, based
on the same issue of trust and reputability.
As a more realistic example, a popular restaurant chain might have an information
source with each of its locations. A local city’s website might be interested in listing
restaurants available in the city. If the restaurant chain opens an additional location
in the city, a disconnect between information will occur unless the city webmaster
actively updates the website with the new location. With a Semantic Web model,
12
however, if the city were able to access the URI of the restaurant chain’s locations, the
information on the city website would always remain accurate with the restaurant’s
records. Translating this example to the process of radiation treatment planning,
the presence of new study data can be included in the knowledge base without any
repetition of previously defined concepts. If desired, calculations can begin including
this data immediately, without the need for a physician or organization to manually
locate and obtain the new study results. This automatic synchronization is what
contributes to the definition of data being “smarter” than static data that can easily
become out-of-date or inaccurate.
It becomes clear that the use of Semantic Web modeling strongly encourages the
sharing and collaboration of data; however, it still allows for the freedom for anyone
to say anything about anything, which is considered to be an absolute necessity for
the World Wide Web to function as it does today [2]. In the example above, if
Organization A had published inaccurate information about presidents, nothing is
stopping Organization B from publishing their own data about Presidents. Then,
as time goes on, the data found to be the most useful, which is likely the most
accurate and complete, will naturally find itself linked to more often, thus aiding in
the collaborative construction of accurate datasets [2].
Creating smarter and more useful data through linking URIs is one substantial
benefit of the Semantic Web, but it is far from the only benefit. As introduced in
Section 2.2, the data being stored starts to become computable. Multiple inheritance
allows an individual to possess properties from several different classes. Defining
relationships with other classes allows for new connections to be inferred through
mathematical properties such as set intersection, union, or transitivity. A variety of
knowledge representation languages have been developed based on the RDF construct,
with varying degrees of capabilities. The basic RDF Schema (RDFS) has a relatively
13
limited set of properties with which to use, such as Domain, Range, and Member.
RDFS-Plus expands upon these properties with set unions and intersections. The
Web Ontology Language (OWL) is an even more powerful language, adding additional
properties such as “someValuesFrom” or “allValuesFrom.” Because of the powerful
inference capabilities of OWL, it is currently used very widely in the development of
Semantic Web knowledge bases [2].
2.4 OWL: Web Ontology Language
OWL provides developers with way to represent knowledge in an extremely useful
format – much more useful than typical narrative text. With access to the full set
theory language (intersections, unions, complements, etc), classes can be built by
defining restrictions on membership, such as “States that border an ocean,” or “Cells
that participate in striated muscle contraction.” With this type of set inclusion (or
exclusion), knowledge begins to become computable. With data being entered into
a Semantic Web model, new analyses can be performed simply by manipulating the
restrictions on various classes.
In addition to the application of set theory concepts, OWL also provides the
ability to define cardinality restrictions on the data, such as “States that border
more than 3 other states,” or “Radiation treatments causing ulceration lasting longer
than 7 days.” With cardinality restrictions, OWL provides a system that can offer
specialization in retrieving relevant data. The need to sift through literature for data
relating to a specific patient, such as an infant, can essentially be eliminated; a query
to the knowledge base can be limited by practically any factor desired. OWL served as
the foundational language for building the Radiation Treatment Guideline Ontology
(RTGO), as discussed in Chapter 4.
14
2.5 Applicability of Semantic Web Principles for
Treatment Planning
By representing data within the Semantic Web, a model that defines entities and
their relationships, as opposed to static literary publications, it not only provides
more direct and comprehensive access to data, but also offers the ability to uncover
new conclusions or relationships that previous knowledge representations could never
produce. In the domain of radiation treatment planning, this allows a system to
be developed that can allow more relevant and specific study data to be recorded,
such as the relationships between radiation on lung tissue on a patient suffering from
emphysema. Once this framework is in place, the knowledge stored in an ontology
(1) becomes computable, with new correlations to be discovered between patient
characteristics previously thought to be unrelated, and (2) enables specialization,
with relationships allowing for the extraction of more relevant data.
2.6 Comparing Ontologies to Relational Databases
Many comparisons are often drawn between ontologies and relational databases. Con-
ceptually, the two are relatively similar: both provide for a way to store data in an
organized format and define relationships between the data. The composition of an
ontology, however, can be viewed as “semi-structured natural language texts” as op-
posed to simply tabular databases [11]. The statement from the example above, Paul
isFatherOf William demonstrates this semi-structured text. In a way, an ontology
can be viewed as a data model that provides an interface to a a data source, such
as a relational database. Ontologies, however, do not simply store data and pro-
vide explicitly defined relationships between the data. Rather, ontologies allow for
queries that can, “reason about the asserted facts and retrieve new facts implied by
15
the known facts” [11]. The inherent ability for re-use and interoperability also makes
a very strong case for ontologies over relational databases. Databases are frequently
developed by organizations for internal purposes and are not geared towards the reuse
of data in the future. Since the nature of ontologies is based on representing concepts
in terms of their actual meaning, it is much easier to consider the possibility of reusing
part or all of an existing ontology.
16
Chapter 3: Radiation Treatment Guideline Ontology
The ontology developed in this thesis is the Radiation Treatment Guideline Ontol-
ogy (RTGO), which was based off studies aimed at obtaining data on the side-effects
of radiation treatments [4, 17, 27]. These studies have been published in various jour-
nals, and are often focused on the effects when treating a specific type of cancer, such
as prostate cancer or bladder cancer. An ontology was used to store all the concepts
contained within these papers to reap the benefits of modeling data in this structure,
as discussed in Chapter 2.
3.1 Radiation Treatment Resources
As a starting point for determining the structure of RTGO, several literary resources
were used. While many papers have been published with results from studies on
the effects of radiation treatments, there are organizations that consolidate these
studies and publish summaries. The American Association of Physicists in Medicine
produces publications labeled as Quantitative Analysis of Normal Tissue Effects in
the Clinic (QUANTEC). Many organ-specific QUANTEC papers have been produced,
providing summaries of radiation treatment studies on a specific organ. These papers
are specifically focused on dose-volume effects, which analyze the maximum dose
recommended that a certain volume of the organ should be exposed to. To build
RTGO and begin a structure allowing for knowledge computability and specialization,
two organ-specific QUANTEC papers were obtained, analyzing dose-volume effects
on the bladder and on the rectum [17, 27].
17
3.1.1 Existing High-Level Ontologies
One of the great advantages of ontologies is that they can be structured together,
with new, domain-specific ontologies being built underneath an existing, high-level
ontology. The biomedical field has many high-level ontologies already developed, such
as the Foundational Model of Anatomy (FMA) and the Systemized Nomenclature of
Medicine (SNOMED). Data from both the FMA and SNOMED were accessed using
the National Center for Biomedical Ontology (NCBO) BioPortal [18].
Developed by the University of Washington School of Medicine, the FMA is an
ontology focused on the domain of anatomy. The FMA contains approximately 75,000
classes with over 120,000 terms and 2.1 million relationship instances, making it one
of the largest ontologies in the field [9]. As a result of the scale of ontologies like
the FMA, it becomes clear how collaboration and reuse of knowledge is one of the
greatest assets to ontologies. It would likely be foolish to consider re-developing an
entire anatomical ontology (unless, for example, the FMA was inaccurate and one
wanted to recreate the ontology from scratch). For development of RTGO, the high-
level classifications for anatomical concepts were implemented in the same structure
as in the FMA.
While the FMA is a relatively specific ontology, SNOMED is wider-scale, cover-
ing many aspects of clinical medicine, including pathology and treatment concepts.
SNOMED was originally designed by the College of American Pathologists (CAP),
but it is now owned and maintained by the International Health Terminology Stan-
dards Development Organization (IHTSDO). With just under 400,000 classes defined,
SNOMED is a very comprehensive ontology [23]. Much of the clinical terminology
(i.e. blood pressure) and pathological terminology (i.e. malignant growth) in RTGO
was implemented using the same class hierarchy as SNOMED.
18
3.2 Concept Extraction from QUANTEC Papers
To accurately represent all the knowledge contained within the QUANTEC papers, a
very meticulous process of concept extraction was performed. Every concept (such as
bleeding, blood, blood cells, surgical procedures, injuries, etc) needed to be extracted
in order to model the knowledge correctly. Although techniques for automated con-
cept extraction using natural language processing are in development, these methods
are not foolproof yet [5]. In the future, concept extraction for ontologies like RTGO
could potentially make use of automated concept extraction software, but for the
current study, five or more manual passes were performed on each paper to extract
concepts contained within. As an example, a total of 351 concepts were extracted
from the QUANTEC paper on the bladder [27], and using the FMA and SNOMED
ontologies as a guide, the concepts were organized in outline format. The two high-
est levels in the hierarchy of concepts are shown in Figure 3.1. At its deepest, the
concepts currently in RTGO reach as deep as nine layers from the top level concepts.
Once the concepts were laid out on paper, they were added as classes to the ontol-
ogy. Figure 3.2 shows a portion of the structure of these top two levels in a graphical
representation of the ontology. To maintain clarity, only the classes in the ontology
were included in the figure, but dozens of relational properties interconnect the ma-
jority of these classes with each other. Although this process of concept extraction
was relatively time-consuming for the two QUANTEC papers, future extraction will
be significantly easier. Because of the reusability of existing terms in an ontology,
when reviewing a new paper, only concepts that were brand new would need to be
implemented. In the case of the QUANTEC papers, 351 concepts were extracted
from the paper on the bladder, while only 48 additional concepts were added to the
ontology to represent the knowledge contained within the paper on the rectum. It is
this elimination of repetition that provides ontologies with much greater scalability
19
Figure 3.1: Extraction of Concepts from QUANTEC Papers
compared to the current approach of literary publications.
3.3 Challenges Building the structure of RTGO
As introduced in Section 2.1, care must be taken to determine the level of abstrac-
tion necessary for the domain at hand. Although there is no one correct structure
20
Figure 3.2: Graphical Representation of Top Two Layers of RTGO
for determining the level of abstraction, one structure may represent the data more
easily than another. The primary consideration is to determine whether a concept
(i.e. stomach, publication, electromagnetism) should be represented as a class or an
individual.
One example of this decision for RTGO was determining how to represent a body
organ like a lung. As a class, Lung allows for other concepts to be associated with
it, such as: Emphysema occursIn Lung. As an individual, Lung could be an instan-
tiation of the generic class Organ. There is no universally correct answer to this
21
issue - depending on the needs and desired functionality of the ontology, Lung could
be implemented either as its own class, or as an instantiation of an organ. Similarly,
MalignantCell could be represented as either a sub-class of the Cell class, or it could
simply be an instantiation of the Cell class, with a descriptive property isMalignant
containing the boolean value of True. In general, concepts were added to RTGO as
classes whenever possible. Only when it was clear that no further sub-division of the
concept would likely be introduced was a concept added as an individual.
22
Chapter 4: RTGO GUI Plug-In Development
The Protege Ontology Editor and Knowledge Acquisition System was chosen as
the software to be used to develop RTGO and an associated GUI Plug-In. Protege is
a free, extensible, Java-based, open-source ontology editor developed at the Stanford
Center for Biomedical Informatics Research. Continued development on the Protege
project is supported by a grant from the United States National Library of Medicine
[24].
4.1 Protege Overview
Protege is a very widely-used application for developing ontologies, with an active
support community of over 176,000 users as of July, 2011. Being Java-based, Protege
is platform independent, allowing ontologies to be built and distributed regardless
of the operating system of the user [24]. Simply, Protege provides a graphical user
interface through which the user is able to develop an ontology. Figures 4.1 and 4.2
show the standard Protege version 4.1 interface, which was used for development1.
4.2 Developing Plug-Ins for Protege
The extensible nature of Protege allows any user to develop a plug-in for the pro-
gram. The Protege Programming Development Kit (PDK) provides documentation
and sample code for developers to utilize when developing plug-ins [21]. The Eclipse
IDE (version Helios, Service Release 1) was used for coding the RTGO plug-in. It is
1Previous versions of Protege were focused around frames, which allow for similar representation ofclasses, relationships, and individuals, but lack the inferencing capabilities of the Web OntologyLanguage (OWL). Current versions of Protege create files in OWL format, which brings reasoningcapabilities to the ontology being developed
23
Figure 4.1: Protege Interface
common for plug-ins to be developed for domain-specific ontologies, since the appli-
cability of the ontology rests upon providing quick access to the most usable aspects
of the ontology. In the case of RTGO, it was desirable to provide quick access to the
dose-volume relationships from previous studies. Running Protege with the RTGO
plug-in developed, a new tab was added to the main Protege interface, allowing the
user to activate the plug-in. This tab can be seen in Figure 4.3.
4.3 RTGO GUI Plug-In
Although hundreds of concepts need to be included in the ontology in order to accu-
rately model radiation treatment planning, the primary focus in treatment planning
studies is often to reach a dose-volume recommendation. In other words, the study
aims to recommend certain doses of radiation that should be applied to a particular
24
Figure 4.2: Protege Interface Editing Class Hierarchy of RTGO
Figure 4.3: Accessing the RTGO Plug-in from the Protege Interface
volume of the organ at hand. The GUI built on top of RTGO focuses directly on
these dose-volume relationships, allowing for dose-volume graphs to be displayed for
a particular organ.
The QUANTEC papers produce recommendations based on conclusions drawn
from multiple individual studies, but since one of the goals of RTGO is to improve
the capacity for specialization, dose-volume recommendations from individual studies
are plotted as well. Figure 4.4 demonstrates the RTGO GUI plug-in interface, with
a plot of the QUANTEC recommended dose-volume curve on the left, and a plot of
multiple dose-volume recommendations from individual studies on the right.
25
Figure 4.4: RTGO Plug-in Interface
4.4 Dose-Volume Relationships
As the interface of the RTGO GUI plug-in is centered around displaying dose-volume
relationships, determining how to represent “Volumes” and “Doses” in the ontology
is of pivotal importance. In general, the dose of a particular radiation treatment is
measured in the unit of grays (abbreviated: Gy). One Gy is defined as the absorption
of one joule of radiation by one kilogram of matter:
1 Gy =1 J
1 kg(4.1)
When a certain volume (or all) of an organ is exposed to radiation, this can be
viewed as a dose-volume relationship. A basic dose-volume plot can be constructed
relating the dose (in Gy) to the percent volume being exposed to the dose, as shown
in Figure 4.5.
Given that the RTGO interface is primarily designed to display these Dose-Volume
26
Figure 4.5: Dose-Volume Histogram for Radiation Effects on the Rectum [17]
Relationships to the user, careful consideration must be taken regarding the quan-
tities actually being graphed. Although it initially may not seem complicated, the
determination of the definition of organ volumes, radiation doses,and even side effects
is rather complex. Building a tool that displays graphs based around these concepts
requires an understanding of what each measurement or description truly represents.
4.4.1 Challenges Defining Organ Volume
Attempting to estimate the volume of an organ receiving a particular dose is far
from a trivial issue. At the outset, the mere definition of a particular organ may
not always be explicitly definable [17]. For example, the concept of an “esophagus”
might seem simple enough, often defined as the organ through which food passes
from the pharynx to the stomach. It is not necessarily defined, however, whether the
27
definition of the esophagus should include the esophageal walls. Along a similar line of
thinking, a representation of a stomach or rectum could be either with or without its
contents. These are uncertainties that, to a certain degree, have been answered over
many years in the development of the FMA and SNOMED, but they demonstrate
the complexities of concept extraction in developing an ontology.
Even when the physical definition of an organ has been established, other issues
arise as well. Many organs, such as the stomach, rectum, or lungs, are distensible,
meaning that they can swell when filling with liquid, solid, or gaseous contents. Thus,
even in the same individual many organs do not have a fixed volume. Rather, their
volumes can fluctuate based on contents, body position, physical exertion, or normal
bodily functions such as digestion. Some of these fluctuations are slow, during normal
body growth and development, but some are more rapid, such as the lungs inflating
and deflating with each breath [27].
Although these and other challenges are faced when determining an organ’s vol-
ume, these complications have been analyzed extensively [14, 17, 27] in order to reach
the closest possible approximations in data being reported. The specifics of reaching
these volume approximations extend beyond the scope of this paper, so the volume
data reported in the QUANTEC and other studies are taken to be sufficiently accu-
rate. For the development of RTGO, the issue of the definition of organ volume is
left up to the institutions carrying out the studies.
4.4.2 Challenges Defining Radiation Doses
Similar to defining an organ volume, many difficulties are encountered when deter-
mining the actual dose to which an organ is exposed. Although the electromagnetic
wave may be delivered at a particular dose, factors such as absorption, dissipation,
and dispersion of the energy begins immediately.
28
Very complex calculations are performed to analyze the precise geometric factors
in planning the radiation absorption. Determining the dose in a planned target volume
(PTV) for the treatment involves arranging radiation beams into 7 or more fields and
calculating the appropriate dose fractions to be applied by each field – applied over
the course of many days, a total dose of 70 Gy might be divided into fractions of 2.12
Gy [10].
Similarly, determining the “true dose” being applied to the tissue (DA) is an
extremely complex process. During the course of treatment, (DA) can change as both
the tumor and normal tissue are deformed by the radiation. Once entering the body
each type of tissue (i.e. skin, blood vessels, muscle etc) absorbs the energy at different
rates. Since the radiation may need to pass through multiple different types of tissue
before reaching the desired target organ, very complex calculations need to be made
to determine the actual dose reaching the organ [14].
As with volume definitions, dose determinations are approximated as closely as
possible, but are not absolutely precise. Again, these dose calculations are beyond
the scope of this paper, but they have been studied extensively [14] and are taken to
be sufficiently accurate.
4.4.3 Challenges Defining Side Effects of Radiation
The determination of how side effects should be represented in the Semantic Web is
yet another process that takes careful consideration. Working off Bentzen’s defini-
tions, endpoints can be defined as “health state characteristics that are used to assess
treatment outcome in a population of patients” [4]. These endpoints can be symp-
toms, signs, or actual measurements obtained from the patient. When dealing with
normal tissue effects, endpoints are side effects that affect, “health-related quality of
life” [4].
29
Understandably, however, there can be a wide range of severity for a particular
endpoint. The endpoint bleeding, for example, may be a simple drop of blood or a
loss of fatal amounts of blood. Due to the subjectivity of endpoints, representing the
varying levels of side effects can be difficult, but several standards organizations have
published guidelines for classifying endpoints.
4.4.4 CTCAE: Common Terminology Criteria for Adverse
Events
The Common Terminology Criteria for Adverse Events (CTCAE) is one such publi-
cation that is frequently used [17] for reporting clinical observations during studies.
Figure 4.6 shows part of the CTCAE publication, version 3.0. Five levels, or Grades,
are given for a specific adverse event, with 1 being the least severe and 5 being the
most severe (usually death).
The QUANTEC papers used in building RTGO presented side effect summaries
using CTCAE guidelines (typically, any effects classified as greater than or equal
to level 2 were included in statistics), but other similar publications for classifying
endpoints are in existence. The Radiation Therapy Oncology Group [17] and the Late
Effects of Normal Tissues-Subjective, Objective, Management and Analytic system
(LENT-SOMA) [27] have been published, but are not used as widely as CTCAE [17].
With the nature of the expandable ontology, however, RTGO has the capability to
include these, or other, endpoint classification guidelines if studies are produced with
results reported using a guideline other than CTCAE.
4.4.5 Acute vs. Late Endpoints
Even with the classification of endpoints into grade levels according to severity, signif-
icant consideration must be given to the time when the endpoint occurs. Endpoints
30
Figure 4.6: CTCAE Version 4.0 Classification of Gastrointestinal Adverse Events [7]
are generally divided into Acute Endpoints and Late Endpoints. The former refers
to an endpoint that occurs immediately after the treatment is administered, while
the latter refers to long-term effects, occurring months or years after the treatment.
Late endpoints are often the most difficult to obtain data about, because patients are
often no longer being actively treated for the original issue or are being treated by a
different physicians [13].
This aspect of late endpoints demonstrates another drawback to static, literary
publications. If a study has already been published, and a summary paper has already
been published based on the original study, the data in that summary paper is static.
With the dynamic nature of an ontology, however, new data can be entered into the
system, and new conclusions or inferences can be made immediately.
31
4.5 Features of RTGO
The RTGO Plug-in interface provides a GUI that queries data stored in the ontology.
Both QUANTEC summary data and individual study data are contained within the
ontology. Every study contains dose-volume data points to construct the correspond-
ing curve on the Dose-Volume histogram.
4.5.1 Filters
In order to meet the goal of modeling data in a manner conducive to specialization
(and given that there are many different factors that go into a radiation treatment
study), the ability to filter data according to specific parameters is crucial. The
implementation of search filters leads to providing a physician with relevant data,
assisting in the development of the best possible treatment plan. RTGO provides two
key filters: a date range filter and a study size filter.
Since new discoveries in the medical field might lead to some older studies being
less relevant, the papers returned from the query can be limited by their date of
publication. Similarly, if a physician wants to exclude small (or large) studies, the
number of patients in the study can also be filtered. These two filters demonstrate
the power of representing data in an ontology. A near-infinite number of filters could
be implemented, allowing for an even more powerful tool. Attempting to locate
specialized data relevant to the specific case at hand is rather burdensome when
navigating literary publications; however, with an ontological storage model, access
to fine-tuned data is nearly instantaneous.
32
4.5.2 Additional Study Details
While the dose-volume data is the most straightforward piece of data gleaned from a
radiation treatment study, there are many factors contained within the papers that
may be of use to the user. The Semantic Web model of an ontology facilitates access
to this information: each paper is represented as an instance of the class Publication.
All bibliographic information about the paper is also contained within the Publication
instances, including details about how the study was performed (represented as a
Study Protocol), containing data such as the number of fractions into which the ra-
diation dose was separated. Using relational properties (i.e. hasOutcome), the Study
Protocol can be linked to specific Endpoints observed during the study.
4.5.3 Direct Access to Abstracts and Full Papers
Although an ideal knowledge extraction process will represent every aspect contained
in the narrative text of a paper, there are times when the actual text of the publication
may need to be referenced - either solely the abstract or the publication in full. In
RTGO, when a search is performed based on parameters specified in the filters, a list
of links is displayed, allowing direct access to the both the abstract and full text of
the paper. Figure 4.7 demonstrates this feature.
33
Chapter 5: Software Engineering Analysis
In the development of any computer software, certain evaluations should be per-
formed in order to ascertain how well the application meets the intended needs. While
there is no single methodology for evaluating a software’s correctness, efficiency, and
scalability, the International Organization for Standards (ISO) 9126 standard for soft-
ware quality evaluation is one of the most commonly used benchmarks for this task
[12]. The ISO 9126 standard outlines a variety of internal, external, and in-use quality
metrics to be evaluated, including (1) accuracy/functionality – whether or not the
data presented by the program is accurate and functions properly, (2) efficiency – how
efficiently the program meets the needs of the user, and (3) scalability/maintainability
– how feasible growth is within the program and the data.
The ISO 9126 standard has been used in the evaluation of a variety of software
designs [6, 1], and is recommended for a variety of reasons: (1) it is a very com-
prehensive standard, evaluating a wide variety of aspects of software; (2) it places
strong emphasis on the needs of users, using wording that is common and understood
by users, developers, and evaluators alike; (3) it provides for an objective analysis
of software quality; and (4) it establishes a reproducible process of evaluation [8].
Although there is always room for improvement, RTGO scores well in an evaluation
based on the ISO 9126 standard.
5.1 Software Engineering Dimensions
Using the ISO 9126 standard, three key dimensions were analyzed: accuracy, effi-
ciency, and scalability. Access to RTGO (both the structured ontology and the GUI
plug-in) was provided to medical physicists trained in radiation treatment planning
35
to analyze and evaluate the ontology in the three key dimensions. As the true focus
of this thesis is on the underlying ontology, the evaluation focuses primarily on how
well the ontology serves as a new model of data representation. The GUI plug-in pro-
vides easy access to the data contained within the ontology, but it was the ontological
representation of the data that was primarily being evaluated.
5.1.1 Accuracy
In the case of RTGO, or any biomedical ontology, accuracy of information being
represented is critical. The terminology and data must be modeled with no errors or
omissions. Stvilia defines accuracy as “The extent to which information is legitimate
or valid according to some stable reference source, such as a dictionary or set of
domain constraints and norms” [25]. For the radiation treatment planning domain,
accurate translation of concepts from the QUANTEC papers into concepts in the
ontology is the primary metric. Any concepts contained in RTGO must be valid
according to the source.
5.1.2 Efficiency
Defined as the “[r]elationship between the level of performance of the software and
the amount of resources used, under stated conditions, taking into account elements
such as the time response, or memory consumption” [8], efficiency is a metric in which
RTGO aims to provide substantial gains for the radiation treatment planning domain.
Although efficiency is often a metric used to quantify the algorithmic efficiency and/or
resource consumption of a machine, a slight adaptation to the definition was utilized
in the analysis of RTGO. While the algorithmic nature of the software is certainly
not insignificant, the goal of RTGO is to provide an increase in human efficiency of
data access. Although efficiency viewed in this light is not as concretely quantifiable
36
(with no static source as in accuracy measurement), evaluators were asked to rate the
ontological structure of RTGO based on the anticipated increase or decrease in the
amount of time required to access data using RTGO compared with their previous
clinical experiences referencing literary publications.
5.1.3 Scalability
Defined as, “[a]ttributes that bear on the effort needed to make specified modifica-
tions,” a data storage model is not useful if it cannot easily accommodate new data
being added over time. Due to the nature of the Semantic Web, this often-challenging
metric is one where ontologies can prove to be very scalable. As in the example of
the Presidents ontology maintained by an existing organization, the reuse of in-
formation aids substantially when it comes to scalability; previous information and
existing structures never have to be reinvented, and – assuming the ontology struc-
ture is designed appropriately – new information fits right into the model. Evaluators
were asked to rate the ontological structure of RTGO based on the anticipated ease
of adding additional studies, concepts, and data into the ontology.
5.2 ISO 9126 Analysis of RTGO
Similar to other ontology analysis studies [8], a simple evaluation of the ontology
was performed by individuals already knowledgeable about the design process and
implementation of an ontology. Four medical physicists experienced in radiation
treatment planning were used in the study. Each was given approximately 60 minutes
to study the structure and features of RTGO. Each evaluator was provided with a
form containing the definitions of each dimension, as quoted in Sections 5.1.1–5.1.3.
On this form, evaluators were asked to quantitatively rate the three software design
dimensions from 1 (worst) to 5 (best). In addition, evaluators were able to give
37
open-ended comments on each of the dimensions.
5.3 Software Engineering Analysis Results
The scores from the evaluators were collected and averaged. A summary of the
numeric scores from each evaluator is presented in Table 5.1. The results were also
plotted on a radar graph, which allows for easy visualization of each evaluator’s scores
compared to the possible range of values (1-5) and with respect to each other. Each of
the three axes on the graph represent the software quality dimensions being evaluated:
accuracy, efficiency, and scalability. Each axis begins at a rating of 1 at the origin
(lowest score) to a rating of 5 at the apex (highest score). The results of the evaluation
are presented Figure 5.1.
Table 5.1: ISO 9126 Evaluation of RTGOEvaluator 1 Evaluator 2 Evaluator 3 Evaluator 4 Average
Accuracy 5 5 5 4 4.75Efficiency 4 4 5 5 4.5Scalability 3 3 4 3 3.25
38
Figure 5.1: Results of ontology evaluation. Four series represent the individual eval-uator ratings. The fifth series represent the average of the ratings
39
Chapter 6: Conclusion and Future Work
The fact that medical study data is presently modeled in a manner that does not
allow for easy computation or specialization limits the practicality of using that data.
Pooling of the data in literature can be done manually, and calculations can then be
performed, but ontologies allow for the data to be represented in a calculable format.
The design of an ontology takes careful consideration, but many existing ontologies
are already well into development, providing the building blocks for domain-specific
ontologies to apply the powerful reasoning abilities of the Semantic Web principles
with minimal repetition of work. Collaborative datasets become increasingly powerful
and applicable to real-world situations, reaching far beyond the capabilities of static
literary text.
6.1 Discussion of Evaluation
The evaluation of RTGO indicates that overall, RTGO scores well in terms of ontol-
ogy design. The accuracy and efficiency of the ontology are both rated very strongly,
with an average of 4.75 and 4.5, respectively. The scalability of the ontology was
determined to be slightly weaker, with an average rating of 3.25. With this develop-
ment of RTGO being the very first attempt at constructing the ontology, these results
are not surprising. The accuracy of concepts reflects the successful transfer of terms
and knowledge in the literary publications. The evaluation of efficiency indicates that
the tool does provide strong potential for increasing the efficiency of accessing data
in radiation treatment planning. The scalability of the ontology, however, indicates
room for improvement, which is unsurprising for any ontology in the initial stages of
development.
40
As with any software, the development process is not a one-time, start-to-finish
method. Successful software products must constantly be reviewed to determine how
well they meet the intended goals. Given that the current version of RTGO was
the initial attempt at constructing the ontology structure and the ontology design
challenges discussed in Section 3.3, a few changes are proposed to improve RTGO.
Currently, toxicity and endpoints are represented in RTGO as individuals. When uti-
lizing these endpoints, however, it became clear that with the various endpoint classi-
fication publications (such as CTCAE and LENT-SOMA discussed in Section 4.4.4),
in addition to the severity grades, the individuals started to become a bit unwieldy.
As noted by the evaluators, these toxicity definitions contributed to a lower rating
for scalability in RTGO’s current form. Reworking the representation of medical
endpoints by providing additional sub-classes and reducing the number of individu-
als required would improve the scalability. With some slight adjustments like these,
future versions of RTGO are expected to be more scalable, allowing for all types of
data (additional studies, organs, endpoint classifications, etc) to be added easily and
logically.
Human evaluation of software, in general, is inherently somewhat of a subjective
process. The first component, accuracy, is relatively objective; concepts should be
spelled correctly and hierarchically accurate. At times, however, there is not always
one correct – or accurate – way to represent certain concepts. Blood, for example,
can be modeled as a bodily fluid, or as a collection of several types of cells. Since
ontologies allow for multiple inheritance, this allows for blood to be modeled in both
approaches. To have a complete, accurate ontology, all existing concepts should be
related accurately with respect to each other.
In the future as a shelf-ready product, objective evaluations of the actual efficiency
of RTGO could be carried out by a precise comparison of the time required to reference
41
data using RTGO versus literary publications. In the current initial stage of RTGO,
however, a degree of subjectivity and approximation is necessary. Given that the
evaluators were all experienced in radiation treatment planning, they could contribute
objective experience and knowledge of the current approach to the development of
a treatment plan. Although RTGO does not yet contain a full store of concepts
and study data, the basic ontology provided a foundation upon which the evaluators
could carefully – although subjectively – consider how efficient the tool would be in
the real-world environment.
The scalability of RTGO was also a facet in which both objectivity and subjec-
tivity were necessary. Objectively, if a concept such as a knee was modeled as an
individual, it would need to be instantiated every time it was used. Clearly, this is
not a scalable implementation for the concept of a knee, but the degree of decreased
scalability resulting from poor implementation decisions was certainly a subjective
determination.
Overall, although the subjectivity of the evaluation is a factor, additional evalu-
ations with even more evaluators would provide for even better data to account for
variance in opinions.
6.2 Ontologies in Other Disciplines
The development and analysis of RTGO has shown that ontological representation of
knowledge can provide many benefits to traditional narrative text publications. With
the existence of anatomical and clinical ontologies in the FMA and SNOMED, the
building blocks for medical ontologies are already in place. While RTGO focuses on
storing data pertaining specifically to radiation treatment planning, it is not hard to
see how a similar ontology could be built for similar domains, especially those with
limited definitive treatment protocols like dementia or Alzheimer’s disease. Ontologies
42
can help structure research data for these diseases that are not well understood with
a model that is more “disciplined” [20].
6.3 Future Work
RTGO in its current form lays the groundwork for even more full-featured ontology-
based systems to be developed, both in the field of radiation treatment planning and
beyond. With additional search filters and more direct integration with external data,
the tool can revolutionize the way in which physicians access radiation treatment data.
6.3.1 Additional Filters
The three filters implemented with this version of RTGO (organ, date range, and
study size), provide a proof-of-concept model that knowledge representation within
an ontology provides the radiation treatment planning domain with a structure that
allows for data to be calculable as well as specializable. Continued development is
in progress to expand the plug-in with additional filters and search tools. Additional
filters will allow more fine-tuned search to be performed and will provide even better
accessibility to desired data. The ability for users to access the most relevant data
allows for more efficient and accurate treatment planning.
In addition, once the RTGO model is in place, the way in which future radiation
treatment studies are conducted can be adjusted: because of this new format of data
storage, it encourages studies to be more specialized. Previously, there was less of
an impetus for researchers to focus on the specifics (i.e. age, gender, ethnicity, etc)
of the patients involved. With a model that not only allows for, but thrives off
of specialized information, we believe that RTGO will encourage improvement in the
documentation of this type of information. The relational aspect of an ontology brings
all the data closer together, allowing for the knowledge base to become a better and
43
better resource over time.
6.3.2 Integration With Journals for Full Text Access
In the current design of RTGO, the access to full-text copies of literary resources is
implemented relatively simplistically. The PDF files of the publications reside locally
on the machine running the RTGO plug-in, and are simply accessed through the
local directory, which is known to RTGO. In the future, however, implementation
with external journals would not be difficult. Since each publication is represented
as an instance, an additional property could be added containing the URL for the
location of the article. The actual specifics of subscription verification and authorized
access to the journal databases would require some careful programming, but from
the perspective of RTGO, the plug-in offers a tool that is ready for the bridge to be
developed.
6.3.3 Integration With Existing High Level Ontologies
As discussed in Section 3.1.1, the FMA and SNOMED ontologies were used as ref-
erences in designing the roots of the structure for RTGO. For simplicity, the classes
in RTGO were extracted from the QUANTEC paper (i.e. the classes necessary for
the specific RTGO domain) and the ontology structure was built by manually adding
these concepts. With ontologies like the FMA and SNOMED already in existence,
however, using these top-level ontologies as building blocks can expand the range of
RTGO immensely.
A successful integration between the ontologies would provide RTGO with full-
access to the thousands of concepts already defined in the FMA and SNOMED.
Although the actual process of ontology integration and combination is beyond the
scope of RTGO, this kind of integration and alignment process will allow RTGO to
44
model an incredible amount of knowledge as a result of the advantages provided by
the Semantic Web backbone.
6.4 Final Words
Based on the principles of the Semantic Web, ontologies provide a model for knowl-
edge representation that introduces many improvements to the current model. The
reasoning, classification, and expansion capabilities allow for data to become more
useable, strongly supporting collaboration and reuse. Ontologies allow for knowledge
to be captured in a way that allows for inferences to be made computationally as
opposed to manually. In addition, access to specialized data is vastly improved when
data is stored in an ontology compared to static literary publications. In the medical
community in particular, the amount of knowledge is incredibly expansive, but the
way that the knowledge is accessed has barely changed over centuries. With the power
of computing available today, it is time for medical research data to become smarter,
more accessible, and more powerful. RTGO harnesses the power of the Semantic Web
to bring this power to the radiation treatment planning domain, and the evaluation
shows great potential for the improvement of RTGO in the future.
45
Bibliography
[1] Al-Kilidar, H., Cox, K., Kitchenham, B. The use and usefulness of the ISO/IEC
9126 quality standard. In Proceedings of the International Symposium on Em-
pirical Software Engineering, November, 2005, Noosa Heads, Australia.
[2] Allenmang, D. and J. Hendler. Semantic Web for the Working Ontologist: Ef-
fective Modeling in RDFS and OWL. Morgan Kaufmann Publishing. 2008.
[3] Bentzen, SM., Constine, LS., Deasy, JO., et al. Quantitative Analyses of Nor-
mal Tissue Effects in the Clinic (QUANTEC): An Introduction to the Scientific
Issues. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Supplement, pp.
S3S9, 2010.
[4] Bentzen, SM., Parliament, M., Deasy, JO., et al. Biomarkers and surrogate end-
points for normal-tissue effects of radiation therapy: the importance of dose-
volume effects. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Supple-
ment, pp. S145S150, 2010.
[5] Buitelaar, P., Cimiano, P., and Magnini, B. Ontology Learning from Text: An
Overview. In Proceedings of the European Conference on Articial Intelligence
and the International Conference on Knowledge Engineering and Management.
2004.
[6] Chua, BB., Dyson, LE. Applying the ISO 9126 model to the evaluation of an
e-learing system. In Proceedings of the 21st ASCILITE Conference, December
2004, Perth, Australia.
46
[7] Common Terminology Criteria for Adverse Events.
http://evs.nci.nih.gov/ftp1/CTCAE/CTCAE-4.03-2010-06-14-QuickReference-
8.5x11.pdf. 2010.
[8] Fernandez-Breis, JT., Aranguren, ME., and Stevens, R. Quality evaluation
framework for bio-ontologies. In Nature Proceedings, July, 2009.
[9] Foundational Model of Anatomy. http://sig.biostr.washington.edu/projects/fm
/AboutFM.html. 2011.
[10] Hunt, MA., Jackson, A., Narayana, A., et al. Geometric factors influencing dosi-
metric sparing of the parotid glands using IMRT. Int. J. Radiation Oncology Biol.
Phys., Vol. 66, No. 1, pp. 296304, 2006.
[11] i2c Why Ontologies? The Centre for Informa-
tion Systems in Infrastructure and Construction.
http://i2c.engineering.utoronto.ca/I2C/Data/Ontology/WhyOntologies.aspx.
2005.
[12] ISO/IEC 9126. http://en.wikipedia.org/wiki/ISO/IEC-9126.
[13] Jackson, A., Marks, LB., Bentzen, SM. et al. The lessons of QUANTEC: Rec-
ommendations for reporting and gathering data on dose-volume dependencies of
treatment outcome. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3, Sup-
plement, pp. S155S160, 2010.
[14] Jaffray, DA., Lindsay, PE., Brock, KK., et al. Accurate accumulation of dose for
improved understanding of radiation effects in normal tissue. Int. J. Radiation
Oncology Biol. Phys., Vol. 76, No. 3, Supplement, pp. S135S139, 2010.
47
[15] Jeraj, R., Cao, Y., Haken, RK., et al. Imaging for assessment of radiation-
induced normal tissue effects. Int. J. Radiation Oncology Biol. Phys., Vol. 76,
No. 3, Supplement, pp. S140S144, 2010.
[16] Marks, LB., Yorke ED., Jackson, A. at al. Use of normal tissue complication
probability models in the clinic. Int. J. Radiation Oncology Biol. Phys., Vol. 76,
No. 3, Supplement, pp. S10S19, 2010.
[17] Michalski, J., Gay, H, Jackson, A, et al. Radiation Dose-Volume Effects in
Radiation-Induced Rectal Injury. Int. J. Radiation Oncology Biol. Phys., Vol.
76, No. 3, Supplement, pp. S123S129, 2010.
[18] National Center for Biomedical Ontology BioPortal.
http://bioportal.bioontology.org/. 2011.
[19] Palta, M., and R. Lee. The development of oncology treatment guidelines: an
analysis of the National Guidelines Clearinghouse. Practical Radiation Oncology
(2011) 1, 3337.
[20] Perl, Y., Geller, J., Gu, H. Identify a forest hierarchy in an OODB specialization
hierarchy satisfying disciplined modeling. In Proceedings COOPIS ’96 Proceed-
ings of the First IFCIS International Conference on Cooperative Information
Systems 1996.
[21] Protege developer documentation. http://protege.stanford.edu/doc/dev.html
2011.
[22] Rosse, C., Mejino, JL. A reference ontology for biomedical informatics: the Foun-
dational Model of Anatomy. Journal of Biomedical Informatics 36 (2003) 478500.
[23] SNOMED Clinical Terms Summary. http://bioportal.bioontology.org/ontologies/42122.
2011
48
[24] Stanford Center for Biomedical Informatics Research: What is Protege?.
http://protege.stanford.edu/overview/index.html. 2011.
[25] Stvilia, B. A model for ontology quality evaluation. In First Monday, December,
2007.
[26] Temal, L., Dojat, M., Kassel, G., Gibaud, B. Towards an ontology for sharing
medical images and regions of interest in neuroimaging. Journal of Biomedical
Informatics 41 (2008) 766778, 2007.
[27] Viswanathan, AN., Yorke, ED., Marks, LB., et al. Radiation Dose-Volume Effects
of the Urinary Bladder. Int. J. Radiation Oncology Biol. Phys., Vol. 76, No. 3,
Supplement, pp. S116S122, 2010.
49
Vita
Thomas M. Minta
Department of Computer Science
Wake Forest University
Post Office Box 7311
Winston-Salem, NC 27109
EducationM.S. Computer Science, Wake Forest University, 2011.
B.S. Computer Science, Wake Forest University, 2008.
Employment
Technology Analyst, University Advancement, Wake Forest University, May 2011-Present.
Research
Minta, T.M., Fulp, E.W., Ge, Y., Turkett, W.H. (2011) Ontological Representation of
Radiation Treatment Data.
Conference and Seminar PresentationsA Comparison of Static to Biologically Modeled Intrusion Detection Systems, Fukuoka
Institute of Technology, November 6, 2010.
50