View
216
Download
0
Tags:
Embed Size (px)
Citation preview
L & CDr. W. Ceusters Language & Computing nv www.landc.be 1
L&C’s LinkBase:L&C’s LinkBase:a multi-lingual Huba multi-lingual Hub
to medical terminologiesto medical terminologies
L&C’s LinkBase:L&C’s LinkBase:a multi-lingual Huba multi-lingual Hub
to medical terminologiesto medical terminologies
Dr. W. Ceusters
Dir R&D
Language & Computing nv
Dr. W. Ceusters Language & Computing nv www.landc.be 2
L & CPresentation overviewPresentation overviewPresentation overviewPresentation overview
• Short history of L&C
• L&C’s integrated approach to medical natural language understanding– Focus on medical terminology management
• Position in the international market
• Relevant demonstrations– LinkFactory– Ontology Browser
Dr. W. Ceusters Language & Computing nv www.landc.be 3
L & CGoal of Language & Computing nvGoal of Language & Computing nvGoal of Language & Computing nvGoal of Language & Computing nv
To provide
users and developers
of systems for
knowledge management
with tools and services
for efficient and accurate
data-entry and retrieval by
exploiting the full power of
automated (medical) natural
language understanding
We hereby declare ...
Dr. W. Ceusters Language & Computing nv www.landc.be 4
L & Cspeech
recognition TTS
natural language
understandingtext
generation
Language EngineeringLanguage EngineeringLanguage EngineeringLanguage Engineering
speech speech
text text
semantic representations
language models
semantic models
dialogue models
speech models
information processing
Dr. W. Ceusters Language & Computing nv www.landc.be 5
L & CThe three pillars of Healthcare ITThe three pillars of Healthcare ITThe three pillars of Healthcare ITThe three pillars of Healthcare IT
EHCRS
Language
Terminology
Individual patient careSeamless care
Historical overview...
Comparability of dataCrossborder careDecision support
Abstraction / grouping...
Faithful data recordingSufficient level of detail
...
Domain of discourse:healthcare
Dr. W. Ceusters Language & Computing nv www.landc.be 6
L & CHistory of R&D in L&CHistory of R&D in L&CHistory of R&D in L&CHistory of R&D in L&C
0
5
10
15
20
25
30
35
40
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Employees Share value (1000 Euro)
Anthem Multi-Tale Dome GIU Select C-CareLiquid Mobidev
R/D ratio
L & CDr. W. Ceusters Language & Computing nv www.landc.be 7
L&C’s integrated approachL&C’s integrated approachL&C’s integrated approachL&C’s integrated approach
Dr. W. Ceusters Language & Computing nv www.landc.be 8
L & CThe L&C integrated solutionThe L&C integrated solutionThe L&C integrated solutionThe L&C integrated solution
Data structure andfunction library for
language understanding
Medical and linguisticknowledge required for
language understanding
NLU enabling tools forknowledge supported
data-entry and -retrieval
Dr. W. Ceusters Language & Computing nv www.landc.be 9
L & CThe L&C Linguistic Concept FactoryThe L&C Linguistic Concept FactoryThe L&C Linguistic Concept FactoryThe L&C Linguistic Concept Factory
Linguistic-semantic Function Library
C-DEFINE(c-meningitis, c-inflammation HAS-LOC c-meninges)
T-DEFINE(“méningite”, french, c-meningitis)
Storage Functions
Retrieval Functions
GET-TERMS(c-meningitis, {french, dutch})
“méningite”, “hersenvliesontsteking”
Dr. W. Ceusters Language & Computing nv www.landc.be 10
L & CArchitectual overviewArchitectual overviewArchitectual overviewArchitectual overview
RMICorbaSoap
LinkFactoryServer
PC
Mac
LinkBaseDatabase
LAN
WAN
Internet
JDBCJava
UnixWorkstation
LinkFactory Workbench
ServerBusinessObjects
Concept tree
...
Translate
Linktype tree
Criteria / Full definitions
Dr. W. Ceusters Language & Computing nv www.landc.be 11
L & CClient Graphical ObjectsClient Graphical ObjectsClient Graphical ObjectsClient Graphical Objects
Dr. W. Ceusters Language & Computing nv www.landc.be 12
L & CBuild-in quality controlBuild-in quality controlBuild-in quality controlBuild-in quality control• Knowledge entered is immediately used to check validity
of subsequent entries• Version management• User-management with :
– Allowed actions based on experience
– Personal audit trail
• Clear and formal separation with 3rd party systems to avoid copying mistakes such as:– UMLS’ cyclical ISA relationships
– SNOMED-RT ‘s “very usual = always” modelling
– Most systems’ overloaded hierarchical relations
Dr. W. Ceusters Language & Computing nv www.landc.be 13
L & CThe L&C Linguistic Concept DatabaseThe L&C Linguistic Concept DatabaseThe L&C Linguistic Concept DatabaseThe L&C Linguistic Concept Database
Formal Domain Ontology
Lexicon
Grammar
Language ALanguage A
Lexicon
Grammar
Language BLanguage B
Cassandra Linguistic Ontology MEDRA
ICD
SNOMED
ICPC
Others ...
Proprietary Terminologies
Dr. W. Ceusters Language & Computing nv www.landc.be 14
L & CA formal terminologyA formal terminologyA formal terminologyA formal terminology
• Separation of terms and concepts
• To be used by machines, not people
• All information is explicit in the structure, not implicit in the terms
• Clean subsumption hierarchies
• Formal, “computable” definitions of concepts
• Internal, automated quality control
Dr. W. Ceusters Language & Computing nv www.landc.be 15
L & CExpl: Joint anatomyExpl: Joint anatomyExpl: Joint anatomyExpl: Joint anatomy
• joint HAS-HOLE joint space• joint capsule IS-OUTER-LAYER-OF joint• meniscus
– IS-INCOMPLETE-FILLER-OF joint space– IS-TOPO-INSIDE joint capsule– IS-NON-TANGENTIAL-MATERIAL-PART-OF
joint
• joint – IS-CONNECTOR-OF bone X– IS-CONNECTOR-OF bone Y
• synovia– IS-INCOMPLETE-FILLER-OF joint space
• synovial membrane IS-BONAFIDE-BOUNDARY-OF joint space
Dr. W. Ceusters Language & Computing nv www.landc.be 16
L & CExpl: Relative spatial localisationExpl: Relative spatial localisationExpl: Relative spatial localisationExpl: Relative spatial localisation
IS-TOPO-
INSIDE-OF
IS-GEO-INSIDE-
OF
IS-INSIDE-
CONVEX-HULL-OF
IS-PARTLY-IN-CONVEX-
HULL-OFIS-OUTSIDE-CONVEX-HULL-OF
HAS-DISCONNECTED-
REGION
HAS-EXTERNAL-
CONNECTING-REGION
HAS-DISCRETED-REGION
HAS-TANG.-SPAT.-PART
HAS-NON-TANG.-SPAT.-PART
IS-SPAT.-
EQUIV.-OF
IS-TANG.-SPAT.-PART-
OF
IS-NON-TANG.-SPAT.-PART-
OF
HAS-PARTIAL-SPATIAL-OVERLAP
HAS-PROPER-SPATIAL
-PART
IS-PROPER-
SPAT.-PART-
OF
HAS-SPATIAL
-PART
IS-SPATIAL-PART-
OF
HAS-OVERLAPPING
-REGION
HAS-CONNECTING-
REGION
HAS-SPATIAL-POINT-
REFERENCE
Dr. W. Ceusters Language & Computing nv www.landc.be 17
L & CExpl: Patient at risk (risk patient)Expl: Patient at risk (risk patient)Expl: Patient at risk (risk patient)Expl: Patient at risk (risk patient)
Having a healthcare phenomenon
Generalised PossessionHealthcare phenomenonHuman
IS-A
Has-possessor Has-
possessed
PatientIs-possessor-of
Patient at risk
IS-A Has-Healthcare-phenomenon
Risk Factor
IS-AIs-Risk-
Factor-Of
Patient at risk for osteoporosis
Risk factor for osteoporosis Osteoporosis
Has-Healthcare-phenomenon
Is-Risk-Factor-Of
IS-A IS-A IS-A
11 1
2
2
IS-A
3
3
44
Dr. W. Ceusters Language & Computing nv www.landc.be 18
L & CLinkBase size per 01-04-2001LinkBase size per 01-04-2001LinkBase size per 01-04-2001LinkBase size per 01-04-2001• 920.000 (850.000) concepts• 2.300.000 terms• 320 link-types• 2.000.000 link instances• 300.000 links to 3rd party systems
• But:– Never finished !– Quality sufficient for current applications
Dr. W. Ceusters Language & Computing nv www.landc.be 19
L & CTex
tT
ext
Res
ult
Res
ult
ProcessorProcessor
Domain representationDomain representation
Goal representationGoal representation
LinguisticLinguisticKnowledgeKnowledge
TaskTaskKnowledgeKnowledge
Form
al d
omai
n
Form
al d
omai
n
onto
logy
onto
logy
L&C Linguistic componentsL&C Linguistic componentsL&C Linguistic componentsL&C Linguistic componentsT
ext
Tex
t
Res
ult
Res
ult
ProcessorProcessor
Domain representationDomain representation
Goal representationGoal representation
LinguisticLinguisticKnowledgeKnowledge
TaskTaskKnowledgeKnowledge
Form
al d
omai
n
Form
al d
omai
n
onto
logy
onto
logy
Dr. W. Ceusters Language & Computing nv www.landc.be 20
L & CL&C application serversL&C application serversL&C application serversL&C application servers
• Coding tools: FastCode
• Semantic indexers: Tessi
• Spell checkers and type ahead: FastType
• Semi controlled language parsers in restricted domains: FreePharma
• Ontology browser
• Stochastic dependency-based indexer: C-Link
• (Ir)relevant document classifier for very low prevalence data sets
Dr. W. Ceusters Language & Computing nv www.landc.be 21
L & CFastCodeGenerator
LinC-Factory
IIntegrated coding approachntegrated coding approachIIntegrated coding approachntegrated coding approach
Formal representation of Classification system
LinCBase
Mapping data
Domain+Linguistic ontology
FastCode client
FastCode server
Codingdata
L & CDr. W. Ceusters Language & Computing nv www.landc.be 22
Benefits of formal multi-lingual Benefits of formal multi-lingual terminology managementterminology management
Benefits of formal multi-lingual Benefits of formal multi-lingual terminology managementterminology management
Dr. W. Ceusters Language & Computing nv www.landc.be 23
L & CSemi-automatic mapping Semi-automatic mapping (ICPC-ICD10)(ICPC-ICD10)Semi-automatic mapping Semi-automatic mapping (ICPC-ICD10)(ICPC-ICD10)
Zenker’sZenker’s diverticulumdiverticulum (D84) (D84)
diverticulumdiverticulumesophagusesophagus
HAS-LOCHAS-LOCpressurepressure
HAS-CAUSEHAS-CAUSE
intraluminalintraluminal
HAS-ORIGHAS-ORIG
Acquired diverticulum of esophagus (K22.5)Acquired diverticulum of esophagus (K22.5)
HAS-HAS-LOCLOC
AcquiredAcquired
HAS-AcqModeHAS-AcqMode
HAS-AqModeHAS-AqMode
Dr. W. Ceusters Language & Computing nv www.landc.be 24
L & CReclassify: Reclassify: FOOT EXARTICULATIONFOOT EXARTICULATIONReclassify: Reclassify: FOOT EXARTICULATIONFOOT EXARTICULATION
• Definitions given by domain-expert:
– ( ( FOOT EXARTICULATION) • { [ IS_A ] ( EXARTICULATION ) } { [HAS_THEME] ( FOOT ) } )
– ( (AMPUTATION OF FOOT) • { [ IS_A ] (AMPUTATION ) } { [ HAS_THEME ] ( FOOT ) } )
– ( (EXARTICULATION)• { [ IS_A ] (AMPUTATION ) } { [ HAS_SOURCE ] ( JOINT ) } )
• Redefinition by automatic classifier– ( ( FOOT EXARTICULATION )
– { [ IS_A ] (AMPUTATION OF FOOT ) }
– { [ IS_A ] ( EXARTICULATION ) } )
Dr. W. Ceusters Language & Computing nv www.landc.be 25
L & CDetection of missing termsDetection of missing termsDetection of missing termsDetection of missing terms
Dr. W. Ceusters Language & Computing nv www.landc.be 26
L & CResolving conflicting viewsResolving conflicting viewsResolving conflicting viewsResolving conflicting views
MESH-2001 : “Seizures”
MESH-2001 : “Convulsions”
Snomed-RT : “Convulsion”
Snomed-RT : “Seizure”
L&C : ConvulsionL&C : Seizure
L&C : Health crisis
L&C : Epileptic convulsion
IS-AIS-A
IS-AIS-A
IS-narrower-than ISA
Has-CCC
Has-CCC
Has-CCC
Has-CCC
L & CDr. W. Ceusters Language & Computing nv www.landc.be 27
Position in the marketPosition in the marketPosition in the marketPosition in the market
Dr. W. Ceusters Language & Computing nv www.landc.be 28
L & CMain business modelMain business modelMain business modelMain business model
Software developersIntegrators
Hospitals Internet Service ProvidersPharmaceutical companies Research OrganisationsMedical Publishers GovernmentHealthcare Insurance Companies MCO
Dr. W. Ceusters Language & Computing nv www.landc.be 29
L & CProject-based product developmentProject-based product developmentProject-based product developmentProject-based product development
Service Component
Product ComponentProject Definition
Corpus analysis
Set up service
Product development
Workbench development
Teach and deliver
Dr. W. Ceusters Language & Computing nv www.landc.be 30
L & CCurrent major partners/clientsCurrent major partners/clientsCurrent major partners/clientsCurrent major partners/clients• Coding tools
– Several hospitals using ICD-9-CM FAstCode
• Terminology management services + NLU based data entry– IDEWE: largest Belgian occupational medicine services
provider
– First Databank UK
– Belgian military medical service
• Semantic indexing– Belgian Professional Association of Pharmaceutical industry
Dr. W. Ceusters Language & Computing nv www.landc.be 31
L & CAcademic Competitors/ColleaguesAcademic Competitors/ColleaguesAcademic Competitors/ColleaguesAcademic Competitors/Colleagues• Main characteristics:
– Prototypes with very small coverage– No professional support
• Relevant examples:– OpenGalen (VUMAN):
• Very small “LinkBase”• “Toy”-link to language (language ignored as medium)
– Protégé (Stanford):• Ontology editor
– Several DL-systems: FacT, Cyclop, LOOM, ...• Tested with very small (tiny) ontologies• More powerful reasoning mechanisms than LinkFactory but totally
intractable on ontologies of over a few 1000 distinct concept classes
Dr. W. Ceusters Language & Computing nv www.landc.be 32
L & CCommercial competitors/colleaguesCommercial competitors/colleaguesCommercial competitors/colleaguesCommercial competitors/colleagues
• Health Language Inc.
• Apelon Inc.– Ontyx– Lexical Technologies
Dr. W. Ceusters Language & Computing nv www.landc.be 33
L & CL&C’s strong positionL&C’s strong positionL&C’s strong positionL&C’s strong position• Multi-lingual and multi-cultural approach • Modelling independent from specific languages but not
from language as communication medium• Proven scalability of our approach• Support at all levels
– Services to migrate existing client dictionairies
– Large tool set for terminology development, maintenance, and/or use
• Only company with in-house expertise in medicine, computational linguistics in many languages, formal ontologies and informatics