68
1 09/12/12 09/12/12 1 09/12/12 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA [email protected]

1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA [email protected]

Embed Size (px)

Citation preview

Page 1: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

109/12/1209/12/12 109/12/12 1

Abstraction Networks for Terminologies

Yehoshua PerlComputer Science Dept.

New Jersey Institute of TechnologyNewark, NJ 07102 USA

[email protected]

Page 2: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

209/12/12

Overview

• What are abstraction networks of terminologies?

• Characteristics of the abstraction networks

• Examples of abstraction network derived for UMLS, SNOMED CT and the MED

• Uses of abstraction networks in visual summarization, orientation, auditing and navigation of terminologies

09/12/12 2

Page 3: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

309/12/1209/12/12 309/12/12 3

Motivation

• Terminologies are playing major roles in healthcare information systems.

• They are large, complex and difficult to maintain.

• Graphical displays are needed for better orientation to aid terminology use and maintenance.

• We have introduced abstraction networks as a way to support orientation.

Page 4: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

409/12/1209/12/12 409/12/12 4

Nature of Abstraction Networks

• Most terminologies have a network structure, with a backbone of IS-A relationships.

• An abstraction network is a secondary network that provides a compact view of the structure and content of the primary terminology.

Terminology Network Abstraction Network

Page 5: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

509/12/125

09/12/12 5

Page 6: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

609/12/126

09/12/12 6

Page 7: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

709/12/1209/12/12 709/12/12 7

Derivation of Abstraction Networks

• Abstraction of a terminology is the process by which subsets of concepts are each replaced by a higher-level conceptual entity called a node.

• These nodes are interconnected by child-of hierarchical relationships.

Terminology

of Concepts Abstraction Network

of Nodes

Subset of concepts modeled by a node

Page 8: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

809/12/1209/12/12 809/12/12 8

Abstraction Network Characteristics (1)

• Three characteristics– Disjointness– Derivation origin– Abstraction ratio

• Disjointness: Does an abstraction network divide the underlying terminology into disjoint parts?

Disjoint abstraction network

Intersection abstraction network

Page 9: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

909/12/12

Abstraction Network Characteristics (2)

• Derivation Origin: Are the nodes derived from the terminology (intrinsic) or are they formulated based on some external knowledge (extrinsic)?

• Abstraction ratio =

Intrinsic derivation Extrinsic derivation

# concepts of terminology

# nodes of abstraction network

Page 10: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

10

09/12/1209/12/12 1009/12/12 10

Intersection Abstraction Network

• An abstraction network is disjoint if each concept of the terminology is mapped to a unique node.

• An abstraction network is an intersection abstraction network if some concepts belong to multiple nodes.

AnatomicalAbnormality Disease

Dynamic subaortic stenosis

Page 11: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

11

09/12/1209/12/12 1109/12/12 11

More on Orientation• An abstraction network offers a high-level view

of the terminology for orientation into its content.

• The orientation problem has two facets– Orientation on the macro level to provide

context for the content and structure of the whole terminology.

– Orientation on the micro level into details of small portions of the terminology.

• Without an orientation on the macro level, it is difficult to obtain an orientation on the micro level due to lack of context.

• Abstraction networks provide macro level orientation.

Page 12: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

12

09/12/1209/12/12 12

Example Abstraction Networks

• We cover abstraction networks for some known terminological systems.– UMLS– SNOMED CT– MED

• We describe the derivation for each example

• We categorize them according to the 3 characteristics above: Disjointness, source origin and abstraction ratio.

Page 13: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

13

09/12/12

An Abstraction Network for the UMLS Metathesaurus

• The two major knowledge sources of the UMLS– Metathesaurus (META) – The Semantic Network (SN)

• The META is a large repository of concepts compiled from more than 160 source vocabularies.

• Its 2011AB META release comprises about 8.6 million terms mapped into more than 2.6 million concepts.

09/12/12 13

Page 14: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

14

09/12/12

Semantic Network Excerpt

Anatomical Abnormality

Physical Object

Entity Event

Conceptual Entity

Organism Attribute

Clinical Attribute

Phenomenon or Process

Injury or Poisoning Natural Phenomenon or Process

Biology Function

Pathologic Function

Disease or Syndrome

Cell or Molecular Dysfunction Experimental

Model of Disease

Mental or Behavioral Dysfunction

Neoplastic Process

Congenital Abnormality

Acquired Abnormality

Anatomical Structure

Fully Formed Anatomical

Structure

Page 15: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

15

09/12/12

Semantic Network

• SN consists of 133 semantic types (high-level categories).

• The SN is organized through IS-A hierarchical relationships in two trees rooted at Entity and Event, respectively.

09/12/12 15

Page 16: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

16

09/12/12

Characteristics of the SN abstraction network

• The SN is an extrinsic abstraction network for META, since it is not derived from META.

• Each concept in META is assigned one or more of SN's semantic types.

• Thus, SN is an intersection abstraction network since a concept may be assigned multiple semantic types.

• SN exhibits an abstraction ratio of about 19,500:1.

• SN has been used in conjunction with the underlying META in a variety of applications.

• 95 papers returned by PUBMED for “Metathesaurus Semantic Network”.

09/12/12 16

Page 17: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

17

09/12/1209/12/12 17

Simple & Compound Semantics

• In the SN intersection abstraction network, concepts with a single category have a simple semantics.

• Concepts with multiple categories have a compound semantics, elaborated by the respective category combination.

• Concepts with compound semantics are complex since they are both

“a this and a that”.

AnatomicalAbnormality

Deformity

Disease or Syndrome

Eyelid Diseases

Lacrimal Duct Obstruction

Simple Simple

Compound

Page 18: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

18

09/12/12

Intersection of Semantic Types• The extent of a Semantic Type S is the set of concepts

assigned S. • There are 73 concepts in the extent of Experimental Model

of Disease (EMD)• Experimental Model of Disease has an intersection with

Neoplastic Process (NP)

09/12/12 18

EMD EMD ∩ NP 26

NP

Page 19: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

19

09/12/12

Non-Uniform Semantics

• Within EMD’s extent, 26 concepts are both experimental models of disease and neoplastic processes, and 47 are only experimental models of disease.

• The non-uniformity of EMD semantic type extent makes it difficult to comprehend the extent of EMD.

EMD

(47)

EMD ∩ NP

(26)

Page 20: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

20

09/12/12

Refined Semantic Network (RSN)

• To address this non-uniformity, we introduced the “Refined Semantic Network” (“RSN”) [Gu, JAMIA 2000].

• RSN comprises two kinds of types: pure semantic types and intersection types.

• The extent of a pure semantic type S is the subset of concepts assigned S, exclusively.

• The pure semantic type Experimental Model of Disease is assigned to the 47 concepts.

09/12/12 20

Page 21: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

21

09/12/12

Intersection Types

• An intersection type is a reifications of a non-empty intersection of the extents of semantic types.

• Example: the RSN contains an intersection type EMD∩ NP with an extent of 26.

09/12/12 21

EMD EMD ∩ NP 26

NP

Page 22: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

22

09/12/12

Acquired Abnormality

Congenital Abnormality

Anatomical Structure

Neoplastic Process

Mental or Behavioral

Dysfunction

Disease or Syndrome

Physical Object

Experimental Model of Disease

Phenomenon or Process

Entity Event

Natural Phenomenon

or Process

Human-caused Phenomenon or

Process

AcquiredAbnormality Disease or Syndrome

Anatomical Abnormality

Disease or Syndrome

Anatomical Abnormality

Biologic Function

Pathologic Function

Congenital Abnormality Disease

or Syndrome

Experimental Model of Disease

∩NeoplasticProcess

Natural Phenomenonor Process ∩Human-caused Phenomenon

or Process

Excerpt of the Refined Semantic Network

IntersectionSemantic Types

Page 23: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

23

09/12/12

Characteristics of the RSN

• The RSN is an intrinsic abstraction network derived automatically from the SN and its semantic-type assignments to the concepts of META.

• The RSN is a disjoint abstraction network.

• The RSN contains a total of 539 types, including 406 intersection types and 133 semantic types.

• The abstraction ratio of approximately 4,800:1.

09/12/12 23

Page 24: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

24

09/12/12

RSN Properties

• RSN hierarchy is a directed acyclic graph (DAG) due to multiple parents of intersection types.

• RSN’s hierarchical depth is 11 as compared to depth 9 for SN.

• In the description of the first version of SN, McCray & Hole state: – “The current scope of the [Semantic] Network is quite

broad, yet the depth is fairly shallow. – We expect to make future refinements and enhancements

to the Network, based on actual use and experimentation.”

• Introduction of the RSN abstraction network is a step in direction planned.

09/12/12 24

Page 25: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

25

09/12/12

Uses of RSN (1)

• The RSN has been proven an excellent vehicle for the support of UMLS auditing.

• The intersection types with very small extents (1-6 concepts) proved to have high likelihood of errors.

• Structural group auditing was introduced for extents of RSN [Chen, JBI 2009, JAMIA 2011]

09/12/12 25

Page 26: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

26

09/12/12

Uses of RSN (2)

• RSN can aid in efficient navigation of the content of META.

• The “Chemical Specialty Semantic Network,” abstraction network is focused on the chemical concepts of the UMLS [Morrey, Cheminformatics 2012].

• The RSN framework supports accurate modeling of complex and conjugate chemicals [Chen, JAMIA, 2009]

Page 27: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

27

09/12/12

Taxonomies for SNOMED CT

• Three related kinds of taxonomies have been formulated as abstraction networks for description-logic-based (DL) terminologies.

• They are the area taxonomy, the partial-area taxonomy, and the disjoint partial-area taxonomy.

• DL Terminologies examples: SNOMED CT and NCIt

• Taxonomies are also applicable for similarly modeled terminologies.– Convergent Medical Terminology (CMT )of Kaiser

Permanente – Enterprise Reference Terminology (ERT) of the VA.

09/12/12 27

Page 28: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

28

09/12/12

Area Taxonomy• The nodes of the area taxonomy are derived from a

partition of a terminology based on the relationships of its concepts.

• Concepts with the exact same relationships are grouped together into an area.

• In the area taxonomy, each area is a node. 09/12/12 28

Morphology topography (3 concepts)

Areamorphology topography

Page 29: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

29

09/12/12

Area Taxonomy for Specimen

09/12/12 29

Page 30: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

30

09/12/12

Area Taxonomy• The area taxonomy is disjoint since each concept has a

unique set of relationships.

• Areas are connected with links called child-of relationships.– A root is top-level concept in an area whose parents

all reside in other areas. – There can be multiple root per area.

09/12/12 30

B B

AA

child-of IS-A

Page 31: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

31

09/12/12

Partial-Area Taxonomy• The partial-area taxonomy refines the area taxonomy by

considering local hierarchical configurations within an area. • A partial-area is a division of an area consisting of a root with

all its descendants in the area.• Each partial-area is a node within the area.

• The partial-area taxonomy is not disjoint.

09/12/12 31

A B CA

(4)B

(6)C

(3)

Partial Area

Area

Page 32: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

32

09/12/12

09/12/12

32

Partial-Area Taxonomy

Page 33: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

33

09/12/12

Summary Visualization

• A partial-area taxonomy refines the visualization of area taxonomy.

• For example, inside area {substance}, there are 11 white boxes, each with the name of the respective partial-area and the number of concepts.

• The name of the partial-area, after its root, represents the overarching semantics of the group.

09/12/12 33

Page 34: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

34

09/12/12

Overlap of Partial Areas• The partial-area taxonomy provides a summarization of the

102 concepts that only exhibit the substance relationship. • The sum of the cardinalities of the four large partial-areas

137, is greater than the cardinality 102 of the entire area. • This occurs due to the overlap among these four non-disjoint

partial-areas.

09/12/12 34

Page 35: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

35

09/12/12

Auditing Small Partial Areas

• In partial area taxonomy we see many small partial-areas of one or two concepts.

• As shown in [Halper, AMIA 2007], the partial-areas of very few concepts have a higher likelihood of concepts in error.

• The partial-area taxonomy visualization serves to enhance a framework for quality-assurance.

09/12/12 35

Page 36: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

36

09/12/12

Overlaps of Partial Areas• Concepts in multiple partial-area complicate the

categorization of the partial-area taxonomy. • In a given partial-area, some concepts belong solely to that

partial-area elaborating the semantics of its root only, others belong to multiple partial-areas.

• We get a partition of the concepts of an area into disjoint partial-areas with no overlaps.

09/12/12 36

disjoint partial-areaA B C

D

Area A(3)

B(5)

C(3)

D(1)

Page 37: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

37

09/12/12

Disjoint Partial Area Taxonomy

• A Disjoint Partial Area Taxonomy is a refinement of the partial-area taxonomy.

• The disjoint partial-areas are the nodes. • These nodes are connected via child-of links, in a

manner similar (but more complex) to that in a partial-area taxonomy.

• The partitioning is carried out in a recursive manner due to the potential of “hierarchical tangling” within the an area (see [Wang, JBI 2012]).

09/12/12 37

Page 38: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

38

09/12/12

Excerpt of the disjoint partial-area taxonomy {substance} area

09/12/12 38

Page 39: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

39

09/12/12

Better Orientation

• This figure illustrates how the disjoint partial-area taxonomy supports orientation to the most tangled parts of a SNOMED hierarchy, as area {substance} of the Specimen hierarchy.

• Six color-coded overlapping partial-areas are on Level 1.

• The overlaps among these six partial-areas are displayed utilizing combinations of their color coding.

• They are arranged in layers according to the number of overlapping partial-areas.

09/12/12 39

Page 40: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

40

09/12/12

Orientation into a Tangled Hiercharchy

• There are 7 disjoint partial-areas inheriting from both partial-areas Body substance sample and Fluid sample with 30 concepts.

• The largest disjoint partial-area, Body fluid sample, has 15 concepts, which were counted twice before, once with respect to Body substance sample (55) and the other with respect to Fluid sample (44).

• The other six disjoint partial-areas (on Level 3) are overlaps of three partial-areas, where Blood specimen (25) is the third with 15 overlapping concepts counted three times in the partial-area taxonomy.

• By the arrangement of these 30 concepts into disjoint partial-areas, the figure gives a picture of their actual nature and respective grouping, with largest disjoint partial-area Acellular blood (serum or plasma) specimen (9).

09/12/12 40

Page 41: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

41

09/12/12

Use in Auditing and Orientation

• In [Wang, JBI 2012], such overlapping concepts were shown to have a statistically significant higher ratio of errors.

• This taxonomy yields insights into the modeling of tangled portions of a hierarchy that can lead to improvements.

09/12/12 41

Page 42: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

42

09/12/12

Taxonomies Characteristics

• All three of these abstraction networks are intrinsic as they are derived strictly from the terminology.

• The area taxonomy and disjoint partial-area taxonomy are disjoint. The partial-area taxonomy is not disjoint.

• The abstraction ratios for the area taxonomy and partial-area taxonomy are 58 (= 1,330 / 23) and 3.26 ( =1,330 / 407), respectively. For the disjoint partial-area taxonomy, the ratio is 2.73 (= 1,330 /487).

09/12/12 42

Page 43: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

43

09/12/12

An Abstraction Network for the MED

• In 2000, we presented an abstraction network for the Medical Entities Dictionary (MED) of Columbia

• The group of all concepts with the same set of properties (i.e., attributes and relationships) is represented by a node with the same attributes and relationships.

09/12/12 43

ax

bx

a

x

cx

Page 44: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

44

09/12/12

Root of a Node• A concept is a root of a given node if all its parent

concepts do not belong to the node.• A child-of relationship is defined from node A to node B to

reflect an IS-A relationship from the root concept of A to a concept in B.

• A root names the node since it generalizes all its concepts

c

r

d

r

d

09/12/12 44

Page 45: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

45

09/12/12

MED Abstraction Network Has 2 Kinds of Nodes

• The first kind, called a property-introduction node, has a unique root for which new properties are defined.                      

• The second kind, called an intersection node has multiple parents from different nodes.

• It inherits properties from each of its parents and thus has more properties than any single parent.

Page 46: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

46

09/12/1209/12/12 46

Excerpt from MED Abstraction NetworkMedical Entity

Anatomic Entity

Sampleable Entity

Measurable Entity

Etiologic Agent Disease or SyndromeICD9 Element Laboratory or Test Result

Event Component

CPMC Radiology Term

Diagnostic Procedure

LaboratoryResults

Abnormal Findings in Body Substances

Number orString Result

ICD9 (or CPT)Procedures

CultureResults

SmearResults

ID Number Plus Text Results

Date Result

Quantity Result

Numeric Result Restricted to Given Range of Values

CPMC Electro-

cardiograph Procedure

Laboratory Diagnostic Procedure

Chemical

Antibiotics

Single-Result Laboratory

Test

CPMC Laboratory Diagnostic Procedures

Physical Anatomic Entity

Water

Cell

Mental or Behavioral Dysfunction

ComaCardiac Dysrhythmia

Microorganism

Organisms Seen on Smear

Radiology Event

Component

Orderable Tests

ICD9 Diagnostic Procedure

Microscopic Examination

Image-Guided Interventional

ProcedureCalcified Body Part or Structure

Abnormal Blood Hematology

Anemia

Hypoglycemia

Adrenal Calcification09/12/12 46

Page 47: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

47

09/12/1209/12/12 47

Deriving the MED Abstraction Network

• The abstraction network obtained is disjoint since descendants of more than one property-introduction root are defined to be concepts of a unique intersection node.

• A program to create such an abstraction network for a given terminology satisfying Cimino’s desiderata is given in [Liu, Distributed and Parallel Databases, 1999]

09/12/12 47

Page 48: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

48

09/12/1209/12/12 48

Properties of MED Abstraction Network

• For the MED, consisting of about 43,000 concepts (1996 version), the abstraction network contains 90 nodes; 53 introduction nodes and 37 intersection nodes.

• For the InterMED (a small offshoot of the MED of about 2,800 concepts), an abstraction network of 28 nodes was derived.

• The abstraction ratios for these two terminologies are respectively 478:1 and 89:1.

• The MED exhibits the characteristic of a unique introduction concept for each property. – Thus, the number of introduction nodes is

bounded by the number of properties in the MED.

09/12/12 48

Page 49: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

49

09/12/12

Abstraction Network from MED Excerpt

09/12/12 49

Medical Entity

Measurable Entity

Specimen

Etiologic Agent

Disease or Syndrome

ICD9 Element

Laboratory or Test Result

Pharmacy Item(Drug and Nondrug)

Drug Enforcement Agency (DEA)

Controlled Substance Category

Number OrString Result

Unknown and Unspecified Cause of Morbid or Mortality

DiagnosticProcedure

American HospitalFormulary

Service Class

Laboratory DiagnosticProcedure

Antihistamine Drug

Heart Disease

Single-ResultLaboratory Test

CPMC Laboratory Diagnostic Procedure

Sampleable Entity

Calcified Pericardium

Pancreatin

Allen Serum Amylase Measurement

Chemical

Anatomical Structure

Page 50: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

50

09/12/12

Excerpt from MEDMedical Entity

Conceptual Entity Sampleable Entity Measurable EntityPhysical Object Event SpecimenEtiologic Agent

SubstanceAnatomic StructureOrderable Entity Intellectual Product Patient Problem Intravascular Fluid Specimen Activity

Classification Disease or Syndrome Finding Acquired Abnormality Chemical Serum Specimen Intravascular Chemistry Specimen Occupational Activity

Pharmacy Concepts ICD9 Element Laboratory or Test Result Lesion Chemical Viewed Structurally Serum Chemistry Specimen Health Care Activity

Pharmacy Item(Drug and Nondrug)

Drug Enforcement Agency (DEA)

Controlled Substance Category

ICD9 Disease Number OrString Result

Calcified Body Part or Structure

Organic Chemical

Allen SerumSpecimen

LaboratoryProcedure

DiagnosticProcedure

American HospitalFormulary

Service Class

CPMC Formulary Drug Item

Disorder ofCirculatory System

Common In-PatientDiagnoses

Diphenhydramine Amino Acid,Peptide or Protein

Laboratory DiagnosticProcedure

Antihistamine DrugDrug Enforcement

Agency (DEA) Class 0

Cardiovascular DiseaseEnzyme

Single-ResultLaboratory Test

CPMC Laboratory Diagnostic Procedure

Heart DiseaseAmylase

Single-Result Chemistry Test CPMC Chemistry Panels

DiphenhydraminePreparation

CPMC DrugsBenadryl 25 MG Cap

Disease of Pericardium

Disease of Pericardium,Other (ICD9)

Calcified Pericardium

Pancreatin

Intravascular Chemistry Test

Serum Chemistry Test

Serum Amylase Test

Serum Total Amylase Test

Allen Serum Amylase Measurement

Amylase Panels

Page 51: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

51

09/12/12

Excerpt from MED Abstraction NetworkMedical Entity

Anatomic Entity

Sampleable Entity

Measurable Entity

Etiologic Agent Disease or SyndromeICD9 Element Laboratory or Test Result

Event Component

CPMC Radiology Term

Diagnostic Procedure

LaboratoryResults

Abnormal Findings in Body Substances

Number orString Result ICD9 (or CPT)

Procedures

CultureResults

SmearResults

ID Number Plus Text Results

Date Result

Quantity Result

Numeric Result Restricted to Given Range of Values

CPMC Electro-

cardiograph Procedure

Laboratory Diagnostic Procedure

Chemical

Antibiotics

Single-Result Laboratory

Test

CPMC Laboratory Diagnostic Procedures

Physical Anatomic Entity

Water

Cell

Mental or Behavioral Dysfunction

ComaCardiac Dysrhythmia

Microorganism

Organisms Seen on Smear

Radiology Event

Component

Orderable Tests

ICD9 Diagnostic Procedure

Microscopic Examination

Image-Guided Interventional

ProcedureCalcified Body Part or Structure

Abnormal Blood Hematology

Anemia

Hypoglycemia

Adrenal Calcification

Page 52: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

52

09/12/12

Uses of MED Abstraction Network

• The abstraction network serves to capture the essence of the MED while ignoring its minutiae.

• It helped to expose and repair some errors and inconsistencies in the MED [Gu, JAMIA 1999].

• It can help in accelerating navigation of the terminology in the search for a concept, the name of which is unfamiliar or forgotten. – Like “drive on highways, switch to service

road near destination.”

Page 53: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

53

09/12/12

Meta-Abstraction Networks• The abstraction network may still be too large

for a compact display on a computer screen. • In such a case, it is possible to re-apply

abstraction and create an abstraction network of an abstraction network, called a meta-abstraction network.

53

Terminology Abstraction Network Meta-abstraction Network

Page 54: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

54

09/12/12

Meta-Abstraction Networks

• Meta-abstraction networks are analogous to the meta-level networks found in data modeling and database systems.

• In the following, we discuss two such meta-abstraction structures defined with respect to the UMLS's Semantic Network (SN) – The cohesive metaschema [Perl, JBI

2003]– The semantic group collection of NLM

[McCray, MEDINFO 2001].

Page 55: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

55

09/12/12

Discussion

• The notion of an abstraction network for a medical terminology was formulated.

• The features of abstraction networks were discussed.

• We presented examples of existing abstraction networks.

• The need for abstraction networks in terms of their support for comprehension, visualization, navigation, and maintenance of terminology content was illustrated.

Page 56: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

56

09/12/12

A Posteriori Derivation

Schema DB

•An abstraction network is analogous to the notion of a database schema.

A priori:

•All the previous examples were developed a posteriori from their underlying terminologies.

A posteriori: Abstraction Network Terminology

Page 57: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

57

09/12/12

A Priori Design of Abstraction Networks for

Terminologies• Ideally, the abstraction network would be

developed a priori to guide the design of a terminology similar to database design.

• We propose that terminology designers proceed in a top-down fashion of first creating an abstraction network for the desired terminology.

• We expect improved efficiency and correctness will occur.

• We hope that this NCBO webinar will motivate such future design approaches.

Page 58: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

58

09/12/12

Next Challenge in Abstraction Network Design

• The example abstraction networks illustrate various derivation techniques needed for different terminologies based on a variety of models.

• It can be tedious research work deriving new kinds of abstraction networks for each new kind of terminology encountered.

• The hope for more widespread use of abstraction networks lies in the standardization of their derivation.

• We saw same derivation technique for SNOMED and NCIt.• If in the same way we identify families of terminologies that

are similar in their properties and models, like these two DL terminologies, then we can probably devise a common technique for the automatic derivation of an abstraction network for each member of a family.

• The ontologies hosted in the NCBO Bioportal offer an opportunity for such design. We started with the OCRe ontology [Ochs, AMIA 2012]

Page 59: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

59

09/12/12

References

• MED• Gu H, Cimino JJ, Halper M, Geller J, Perl Y. Utilizing OODB Schema Modeling

for Vocabulary Management. In: Cimino JJ, editor. Proc. 1996 AMIA Annual Fall Symposium. Washington, DC; 1996. p. 274-278.

• Gu H, Halper M, Geller J, Perl Y. Benefits of an Object-Oriented Database Representation for Controlled Medical Terminologies. JAMIA. 1999 July/August;6(4):283-303.

• Liu L, Halper M, Gu H, Geller J, Perl Y. Modeling a Vocabulary in an Object-Oriented Database. In: Barker K, Ozsu MT, editors. CIKM-96, Proc. 5th Int'l Conference on Information and Knowledge Management. Rockville, MD; 1996. p. 179-188.

• Liu L, Halper M, Geller J, Perl Y. Controlled Vocabularies in OODBs: Modeling Issues and Implementation. Distributed and Parallel Databases. 1999 Jan;7(1):37-65.

• Liu L, Halper M, Geller J, Perl Y. Using OODB Modeling to Partition a Vocabulary into Structurally and Semantically Uniform Concept Groups. IEEE Trans Knowledge & Data Engineering. 2002 July/August;14(4):850-866.

Page 60: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

60

09/12/12

References

• UMLS• Gu H, Perl Y, Geller J, Halper M, Liu L, Cimino JJ. Representing the UMLS as an OODB:

Modeling Issues and Advantages. JAMIA. 2000 Jan/Feb;7(1):66.80. Selected for reprint in: R. Haux and C. Kulikowski, editors, Yearbook of Medical Informatics: Digital Libraries and Medicine (International Medical Informatics Association), pages 271-285, Schattauer, Stuttgart, Germany, 2001.

• Geller J, Gu H, Perl Y, Halper M. Semantic Refinement and Error Correction in Large Terminological Knowledge Bases. Data & Knowledge Engineering. 2003 Apr;45(1):1-32.

• Morrey CP, Perl Y, Halper M, Chen L, Gu H. A Chemical Specialty Semantic Network for the Unified Medical Language System. Journal of Cheminformatics. 2012 May;4(2). doi:10.1186/1758-2946-4-9.

• Gu H, Elhanan G, Perl Y, Hripcsak G, Cimino JJ, Xu J, et al. A Study of Terminology Auditors' Performance for UMLS Semantic Type Assignments. Journal of Biomedical Informatics (2012), http://-dx.doi.org/10.1016/j.jbi.2012.05.006 (in press).

• Gu H, Perl Y, Elhanan G, Min H, Zhang L, Peng Y. Auditing Concept Categorizations in the UMLS. Articial Intelligence in Medicine. 2004;31(1):29-44.

• Zhang L, Perl Y, Halper M, Geller J, Cimino JJ. An Enriched Unified Medical Language System Semantic Network with a Multiple Subsumption Hierarchy. JAMIA. 2004 May/June;11(3):195-206.

• Chen L, Morrey CP, Gu H, Halper M, Perl Y. Modeling multi-typed structurally viewed chemicals with the UMLS Refined Semantic Network. J Am Med Inform Assoc. 2009 Jan-Feb;16(1):116-31.

Page 61: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

61

09/12/12

References

• SNOMED-CT• Wang Y, Halper M, Min H, Perl Y, Chen Y, Spackman KA. Structural

Methodologies for Auditing SNOMED. Journal of Biomedical Informatics. 2007 Oct;40(5):561-581.

• Min H, Perl Y, Chen Y, Halper M, Geller J, Wang Y. Auditing as Part of the Terminology Design Life Cycle. JAMIA. 2006 November/December;13(6):676-690.

• Wang Y, Halper M, Wei D, Perl Y, Geller J. Abstraction of Complex Concepts with a Rened Partial-Area Taxonomy of SNOMED. Journal of Biomedical Informatics. 2012 Feb;45(1):15-29.

• Wang Y, Halper M,Wei D, Gu H, Perl Y, Xu J, et al. Auditing Complex Concepts of SNOMED using a Refined Hierarchical Abstraction Network. Journal of Biomedical Informatics. 2012 Feb;45(1):1-14.

• Halper M, Wang Y, Min H, Chen Y, Hripcsak G, Perl Y, et al. Analysis of Error Concentrations in SNOMED. In: Teich JM, Suermondt J, Hripcsak G, editors. Proc. 2007 AMIA Annual Symposium. Chicago, IL; 2007. p. 314-318.

09/12/12 61

Page 62: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

62

09/12/12

References

• METASCHEMA• Perl Y, Chen Z, Halper M, Geller J, Zhang L, Peng Y. The cohesive

metaschema: A higher-level abstraction of the UMLS Semantic Network. Journal of Biomedical Informatics. 2003 Jun;35(3):194 - 212.

• McCray AT, Burgun A, Bodenreider O. Aggregating UMLS Semantic Types for Reducing Conceptual Complexity. In: Proc. Medinfo2001. London, UK; 2001. p. 171-175.

• Zhang L, Perl Y, Halper M, Geller J, Hripcsak G. A Lexical Metaschema for the UMLS Semantic Network. Articial Intelligence in Medicine. 2005 Jan;33(1):41-59.

• Chen Y, Perl Y, Geller J, Hripcsak G, Zhang L. Comparing and Consolidating Two Heuristic Metaschemas. Journal of Biomedical Informatics. 2008 Apr;41(2):293-317.

• Zhang L, Perl Y, Halper M, Geller J. Designing Metaschemas for the UMLS Enriched Semantic Network. Journal of Biomedical Informatics. 2003 Dec;36(6):433-449.

09/12/12 62

Page 63: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

63

09/12/12

Thank you

Page 64: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

64

09/12/12

Auxiliary Material on Meta Abstraction Networks

Page 65: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

65

09/12/12

Metaschema• A metaschema comprises a collection of nodes, each a group of

connected semantic types following some criterion. • For the cohesive metaschema, the criterion is a set of semantic

types with (almost) same relationships .– collection of disjoint, singly-rooted, connected sets called meta-

semantic types. – Sets promoted to meta nodes to form the cohesive metaschema

Anatomical Abnormality

Congenital Abnormality

Acquired Abnormality

Anatomical Abnormality

(3)

Page 66: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

66

09/12/12

The cohesive metaschema hierarchy.

Page 67: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

67

09/12/12

Semantic Groups

• A partition of the SN into disjoint groups was proposed based on six general principles: semantic validity (assessable by connectivity), parsimony, completeness, exclusivity, naturalness, and utility.

• Its application yielded a collection of 15 so-called “semantic groups” (“SGs”), each comprising a set of semantic types.

• The SGs form the nodes of a meta-abstraction structure that we call the SG collection. Example SGs include: Genes & Molecular Sequences (containing five semantic types), Activities & Behaviors (nine semantic types), Anatomy (11), and Chemicals & Drugs (26) (Some SG groups not connected in SN).

Page 68: 1 09/12/12 1 1 Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com

68

09/12/12

Characteristics of META Abstraction Networks

• The SG collection is coarser-grained view of the Metathesaurus than SN, in an effort to reduce complexity. 

• Both the cohesive metaschema and the SG collection are disjoint.

• SG is extrinsic, derived from the subject areas covered by the SN.

• The metaschema is intrinsic, derived from SN itself.

• The abstraction ratios-defined for the SN-are 5:1 for the metaschema and 9:1 for the SG network.