29
Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig [email protected]

Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

Embed Size (px)

Citation preview

Page 1: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

Institut für Informatik

AutomatischeSprachverarbeitung

The Impact of Semantic Handshakes

TMRA 2006, Leipzig, 12.10.2006

Lutz Maicher, University of [email protected]

Page 2: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

2Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Agenda

• The Integration Model of the TMDM• Semantic Handshakes and Interaction Protocols• Simulations• Result and Discussion

Page 3: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

3Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Preliminary Remark

• This presentation does only describe the impact of a phenomenon which is determined by the existence of– the integration model of the TMDM (Topic Maps Data Model)

– Topic Maps Communication Protocols like TMRAP, TMIP, etc

• This presentation does not propose any new issues– nor methodologies, technologies, paradigms or anything else

Page 4: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

4Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

The Integration Model of the TMDM

Page 5: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

5Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

The Integration Model of the TMDM

• Two Topic Items are equal if (TMDM 5.3.5):(they represent the same Subject)

– at least one equal string in their [subject identifiers] properties,

– at least one equal string in their [item identifiers] properties,

– at least one equal string in their [subject locators] properties,

– an equal string in the [subject identifiers] property of the one topic item and the [item identifiers] property of the other, or

– the same information item in their [reified] properties.

• Equal Topic Items A and B have to be merged into C (TMDM 6.2)

– ….

– Set C's [subject identifiers] property to the union of the values of A and B's [subject identifiers] properties.

– ….

Page 6: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

6Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

The Integration Model of the TMDM in practice

[subject identifier] {ns1:LutzMaicher}

A

[subject identifier] {ns2:MaicherLutz}

Bequality holds not(according TMDM)

[subject identifier] {ns1:LutzMaicher}

A

[subject identifier] {ns2:MaicherLutz}

B

In the case of terminological diversity….

Page 7: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

7Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

The Integration Model of the TMDM in practice

[subject identifier] {ns1:LutzMaicher}

A

[subject identifier] {ns1:LutzMaicher}

Bequality holds(according TMDM)

C

[subject identifier] {ns1:LutzMaicher}

merging(according TMDM)

In the case of terminologial alignment…. the PSI case

But who can enforce universal vocabularies?

Page 8: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

8Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Semantic Handshakes and Interaction Protocols

Page 9: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

9Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Semantic Handshake

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz}

A

[subject identifier] {ns2:MaicherLutz}

Bequality holds(according TMDM)

C

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz}

merging(according TMDM)

The author of A has decided that both terms can be used to indicate Lutz Maicher

Page 10: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

10

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes and Interaction Protocols

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz}

A

[subject identifier] {ns2:MaicherLutz, ns3:ML}

B

[subject identifier] {ns3:ML}

C

[subject identifier] {ns4:Lutz, ns3:ML}

D

Local Semantic Handshake Local Semantic Handshake

Local Semantic Handshake

TM1

TM3

TM2

TM4

All Topic Maps interacting using the existing protocols like TMRAP, TMIP …

Page 11: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

11

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes and Interaction Protocols

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz}

A

[subject identifier] {ns2:MaicherLutz, ns3:ML}

B

[subject identifier] {ns3:ML}

C

[subject identifier] {ns4:Lutz, ns3:ML}

D

Request: Do you have a Topic Item with „ns1:LutzMaicher“ or „ns2:MaicherLutz“ in the property [subject identifier]? (Do you have information about the Subject Lutz Maicher?)

Step 1

Page 12: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

12

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes and Interaction Protocols

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz}

A

[subject identifier] {ns2:MaicherLutz, ns3:ML}

B

[subject identifier] {ns3:ML}

C

[subject identifier] {ns4:Lutz, ns3:ML}

D

Request: Do you have a Topic Item with „ns1:LutzMaicher“ or „ns2:MaicherLutz“ in the property [subject identifier]? (Do you have information about the Subject Lutz Maicher?)

NO

NO

ns2:MaicherLutz, ns3:ML

ns2:MaicherLutz, ns1:LutzMaicher

Step 1

Page 13: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

13

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes and Interaction Protocols

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML}

A

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML}

B

[subject identifier] {ns3:ML}

C

[subject identifier] {ns4:Lutz, ns3:ML}

D

Request: Do you have a Topic Item with „ns1:LutzMaicher“, „ns2:MaicherLutz“ or „ns3:ML“ in the property [subject identifier]?

Step 2

Page 14: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

14

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes and Interaction Protocols

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML}

A

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML}

B

[subject identifier] {ns3:ML}

C

[subject identifier] {ns4:Lutz, ns3:ML}

D

Request: Do you have a Topic Item with „ns1:LutzMaicher“, „ns2:MaicherLutz“ or „ns3:ML“ in the property [subject identifier]?

ns1:LutzMaicher, ns3:ML,ns2:MaicherLutz

ns3:ML

ns4:Lutz, ns3:ML

ns1:LutzMaicher, ns3:ML,ns2:MaicherLutz,

Step 2

Page 15: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

15

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Local Semantic Handshakes leads to Global Integration

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML, ns4:Lutz}

A

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML}

B

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML, ns4:Lutz}

C

[subject identifier] {ns1:LutzMaicher, ns2:MaicherLutz,

ns3:ML, ns4:Lutz}

D

TM1

TM3

TM2

TM4

Global Integration through Local Semantic Handshakes.

Page 16: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

16

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Hypothesis and Simulation Design

Page 17: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

17

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Hypothesis

• Due to the existence of the TMDM and interaction protocols,terminological diversity will be resolved to global integration if the majority of Topics discloses one local Semantic Handshake

• Simulations for testing the Hypothesis …

Page 18: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

18

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Simulation Design

Create Topics– Create a number (cardE) of Topics which are assumed to exist in the

world and representing the same Subject by definition– All Topics can always interact with each other Add Subject Identifiers randomly– Draw a number of Subject Identifieres (nbrOfDifferentII) which should

be assigend to the Topic according to a given distribution (distributionNbrOfII)

• if number is 1 no semantic handshake• if number is bigger than 1 semantic handshakes are done

– Draw for each Subject Identifier of a Topic an integer according to a given distribution (distributionII) in the range [1..nbrOfII]

Start Interaction between Topics– If two Topics have an identical number in their sets of Subject Identifiers

they become merged (the sets of Subject Identifiers of both Topics become the union of the origin sets)

Page 19: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

19

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Definition of an Distribution

• Distributions are defined as follows:

<{0.8,1.0},6> is similar to the lottery– that 1,2,3 is drawn with the probability 80%

– that 1,2,3 is drawn with the probability 20%

<{0.8,0.9,0.97,1.0}, 100> is similar to the lottery– that a number in [1,25] is drawn with the probability 80%

– that a number in [26,50] is drawn with the probability 10%

– that a number in [51,75] is drawn with the probability 7%

– that a number in [76,100] is drawn with the probability 3%

Page 20: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

20

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Analysis - Measures

• Measures of Interest (after some iterations)– Number of independet clusters (integration clouds)

• an integration cloud is a set of Topics which are equal

– Average size of the integration clouds

)()()( EcardTcardTcardEe

i

i

clouds(E) the lower the better

clouds(E) = 1 global integration

the higher the better

card(T) = card(E) global integration

clouds(E) = 3

card(T) = 33/9 = 3,7

clouds(E) = 2

card(T) = 41/9 = 4,6

Page 21: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

21

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Experiment Series

Page 22: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

22

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Simulation: Global Ontology the PSI Case

• No Simulation is necessary– each Topic has the same, globally unique Subject Identifier

– clouds(E)=1 (Global Integration)

– card(T) = card(E)

… but the enforcement of global ontologies is an overly optimistic premise!

Page 23: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

23

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Simulation: Heterogenous World without Semantic Handshakes

Iteration of nbrOfDifferentII in [5,100]general parameter: card(E)=100, distributionNbrOfII=<{1.0},1>specific parameter exp01: distributionII=<{1.0},100>specific parameter exp02: distributionII=<{0.8,0.9,0.95,1.0},100> some terms are

more prominent

100 different terms willbe resolved less then 40 integrationclouds because some authors use the same term by chance(esp. the most prominent terms)

no Semantic Handshakes

Page 24: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

24

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Simulation: The Impact of Semantic Handshakes

Iteration of a in distributionNbrOfII=<{a,1.0},2> in [0.0,1.0]general parameters: card=100, nbrOfDifferentII=100specific parameters exp03: distributionII=<{1.0}, 100>specific parameters exp04: distributionII=<{0.8,0.9,0.97,1.0}, 100>

no semantic handshakesalways a semantic handshake

some terms are more prominent

high terminological diversity

100 different terms willbe resolved to ten integrationclouds if only 55% of all Topicsdisclose a Semantic Handshake!

Page 25: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

25

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Simulation: The Impact of the terminological diversity

Iteration of nbrOfDifferentII in [2,100]general parameters: cardE=100, distributionII=<{1.0},100>specific parameter exp05: distributionNbrOfII=<{0.2,1.0},2>specific parameter exp06: distributionNbrOfII=<{0.8,1.0},2>

high terminological diversitylow terminological diversity

semantic handshake by the minority

semantic handshake by the majority

50 different terms willbe resolved to global integration if 80% of all Topicsdisclose a Semantic Handshake!

Page 26: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

26

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Result and Discussion

Page 27: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

27

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Result

• Hypothesis is proofed: Global Integration will be reached if a significant number (majority) of Topics disclose one semantic handshake.– Remark

• the effect does only appear, if there exist interaction links between all topic maps

• the time point the effect appears depends on the interaction frequency

• The more prominent the used terms are, the lower the global number of semantic handshakes necessary for global integration.

• Design Recommendation:– Assign two (prominent) Subject Identifiers to each Topic you create.

(You don‘t have to be aware of all existing terms for your concept.)

Page 28: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

28

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Discussion

• These findings include problems concerning– Wrong Semantic Handshakes (by mistake, by purpose)

• Homonymy (= the same term for different concepts)• Trust (Can I trust the local Semantic Handshakes?)

• … but they are implied by the existence of the– TMDM and

– Topic Maps Interaction Protocols

Page 29: Institut für Informatik Automatische Sprachverarbeitung The Impact of Semantic Handshakes TMRA 2006, Leipzig, 12.10.2006 Lutz Maicher, University of Leipzig

29

Lutz Maicher

Institut für Informatik

The Impact of Semantic Handshakes AutomatischeSprachverarbeitung

Questions?!