Better architecture with semantic integration

1

Semantic integration

TMRA, 2010-10-01 Lars Marius Garshol

2

Architectural challenges

• System architecture is not enterprise architecture – what's good for a single system is not necessarily good for the

enterprise as a whole – local decisions make sense locally, but not necessarily globally

• Organizational reorganizations – merges, aquisitions, political reorganizations, ... – these all have implications for IT architecture

• Master data management – of course you only have a single customer database – ...until you buy a company that has their own

• Architecture dictators – are a tempting solution to impose some general structure, – but generally impede progress at the local level

3

Lego!

• The goal is not just an architecture that's correct today

• Because we know the situation will soon change

• The goal is an architecture that's easily adapted to tomorrow's environment

4

A step-by-step process

1. Reference model

2. Master data

3. Reference data

4. Generic services 6. Search

7. Seman<c formats

5. Access control

5

Mapping the IT landscape

• Which IT systems exist? • Which entities do they have? • What are their properties? • What web services exist? • What is their input and output?

App En#tet Egenskap

Tjeneste Format

Step #1

6

The value of the map

• Living high-level view of systems and services – navigation/search/visualization, – connect to relevant documentation, where it exists, – far superior to PowerPoint/Word

• The main value is in the structure – that is, the use of a proper semantic model

• No need to build or buy software – it can all be done with existing open source

components

Step #1

7

Problems with the map

• There is no connection across systems – shows that 7 systems have the concept "case" – but so what?

• What is needed is a cross-mapping – must show to what degree the 7 concepts overlap – perhaps there are other concepts that mean the same,

but have different names?

Step #1

8

Build a reference model

• Model central concepts – entities and their properties – type hierarchy for both – independent of any specific system; model the

organization's understanding

• This is not a canonical data model – not a format – not forcing any systems to actually use the model – systems can use entities/properties which do not

(yet) exist in the model

Step #1

9

What is cohabitation?

• Law on individual pensions – LOV-2008-06-27-62, § 3-7

• Med samboer forstås her a) person som kunden har felles bolig og felles barn med, b) person som kunden lever sammen med i ekteskaps- eller partnerskapslignende forhold når det godtgjøres at forholdet har bestått uavbrutt i de siste fem år før kundens død, og det ikke forelå forhold som ville hindre at lovlig ekteskap eller registrert partnerskap ble inngått.

• Regulation on collecting information – FOR-2005-07-08-826, punkt 1

• samboere: personer som lever sammen og har felles barn.

• Law on "vergemål" – LOV-2010-03-26-9, § 2

• Med samboere menes i denne loven to personer som bor sammen i et ekteskapslignende forhold.

• ...

Step #1

10

Modelling Step #1

Cohabita<on

Cohabita<on PENSION

Cohabita<on INNH

Cohabita<on VERGE

Cohabita<on XXX

Cohabita<on YYY

11

Properties, too Step #1

Cohabita<on Person

cohab 1

cohab 2 start date end date ...

12

Connect the system data

App #1 App #2 App #3

COHAB COHABITERS BPG_COHAB

Step #1

Cohabita<on

Cohabita<on PENSION

Cohabita<on INNH

Cohabita<on VERGE

Cohabita<on XXX

Cohabita<on YYY

13

Degrees of correlation

• Perfect correlation – not unusual, neither is it always the case

• Specialization – that is, B is a narrower concept than A is

• Overlap – A and B share a common subset

• Resembles – A and B are related, but the connection is not clear

A B

Step #1

14

Uses of the model

• We describe to understand – we want to understand so that we can improve

• Analysis of the architecture – starting point for a restructuring – identify master data issues – etc

• A data dictionary – useful when converting legacy data – useful for bug fixing – key personell no longer have to answer questions all the

time – etc

Step #1

15

From documentation to services

• So far we've only discussed documentation for humans – this is highly useful in a number of ways – but it is only the beginning

• The model has a semantic structure – therefore we can use it to build new kinds of services

Trinn #1

16

Master data control

• Pick one system to be the master for each kind of data – where this can really be centralized

• Other systems needing the data must become clients of the master – this is a gradual transition

• We also need a protocol which the clients can use to retrieve the data

Step #2

17

A service broker

• A service which routes requests

• A layer above the ESB – the ESB takes care of

transport – it might also broker between

several ESBs

• Uses its knowledge of information and services

• Decouples clients from servers

• Makes the architecture a lot more flexible

Broker Reference model

ESB

Step #2

18

Master data protocol

• Used by clients to retrieve data updates

• Makes it possible to gradually migrate

• Master can change without the clients knowing

Broker


App #4

Sync request: en<ty + <me

Referanse-‐modell

Lookup

Atom (SDshare)

Step #2

19

Collect reference data

• Most systems share a number of fairly static lists – list of countries, list of diagnosis codes, list of

provinces, ...

• There is no reason to maintain this in duplicate in different systems – the lists also need common identifiers for the items

• The lists might as well go into the reference model – can be retrieved from there by client systems

Step #3

20

Generic lookup service

• Would not work "out of the box"

• Must be carefully set up so that it works

• Possible because this is a controlled environment

Broker

App #1


Service #1 Service #2 Service #3

Give me an en<ty of type X with ID 23414

Who has data about

X?

Step #4

21

Generic translation service

• Sometimes the client wants format X, but the server can only supply Y – the broker can find a translator service, and – ensure that the translation happens automatically

X-‐>Y Y-‐>Z X-‐>Z

Y-‐>X Z-‐>Y Z-‐>X

Broker

Step #4

22

Some science fiction

• We already have – the structure of XML formats X and Y described, and – connected to the reference model

• In some cases we can then generate the translator automatically – made prototype in 2004 – it worked! – but it won't always work

Step #4

23

Impact analysis

• If we register clients and their requests in the model we know more about the uses of the architecture

• It becomes possible to find the answer to questions like – "can we stop this service?" – "does anyone use this format?" – ...

Step #4

24

Access control

• Rules for this are usually – not documented anywhere, – encoded in software all over the enterprise

• It can also be represented in the model – user groups can be connected to the data they are

allowed to access/modify – there is no need to represent individual users

• All this can be retrieved via web services

Step #5

25

Generic querying with SPARQL

• More advanced lookup – using SPARQL as the query language – can do more than just looking up IDs – doing queries "into the cloud"

• The reference model is used to interpret the query – splits it up and delegates to different services – the broker then assembles the result

• SPARQL is not a very powerful language – that is why this is possible

Trinn #6

26

Semantic formats "on the wire"

• Ordinary XML formats are static – semantic formats are dynamic

• New fields and entity types can be added – without changing the format – without confusing recipients

• Transparent support for subtyping – allowing even more flexibility in interpretation

• Support for merging – again transparent for recipients

Step #7

27

Scenario

App #1 Need all X sa<sfying certain criteria, in format Y

Broker

SPARQL, formatY

App #2

Database

SPARQL

XTM SPARQL

XTM

Merge and filter

Translator

XTM

formatY

Step #7

Service #1

Service #2

28

Conclusion

• Clients don't need to know where the data are – much looser coupling – much easier to restructure

• Clients do not need to relate to the many data models that are in use – instead they need only refer to the reference model

• Clients do not need to worry about the data formats used by servers – instead, they simply ask for data in the format they

want

Technology

Better architecture with semantic integration