Upload
lars-marius-garshol
View
638
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This paper presents an approach to using semantic technolo- gies to achieve better and more flexible integration of IT systems. The author believes that the described approach is applicable to a great many organizations, and that it can lead to far more dynamic IT architectures than what is common today.
Citation preview
1
Semantic integration
TMRA, 2010-10-01 Lars Marius Garshol
2
Architectural challenges
• System architecture is not enterprise architecture – what's good for a single system is not necessarily good for the
enterprise as a whole – local decisions make sense locally, but not necessarily globally
• Organizational reorganizations – merges, aquisitions, political reorganizations, ... – these all have implications for IT architecture
• Master data management – of course you only have a single customer database – ...until you buy a company that has their own
• Architecture dictators – are a tempting solution to impose some general structure, – but generally impede progress at the local level
3
Lego!
• The goal is not just an architecture that's correct today
• Because we know the situation will soon change
• The goal is an architecture that's easily adapted to tomorrow's environment
4
A step-by-step process
1. Reference model
2. Master data
3. Reference data
4. Generic services 6. Search
7. Seman<c formats
5. Access control
5
Mapping the IT landscape
• Which IT systems exist? • Which entities do they have? • What are their properties? • What web services exist? • What is their input and output?
App En#tet Egenskap
Tjeneste Format
Step #1
6
The value of the map
• Living high-level view of systems and services – navigation/search/visualization, – connect to relevant documentation, where it exists, – far superior to PowerPoint/Word
• The main value is in the structure – that is, the use of a proper semantic model
• No need to build or buy software – it can all be done with existing open source
components
Step #1
7
Problems with the map
• There is no connection across systems – shows that 7 systems have the concept "case" – but so what?
• What is needed is a cross-mapping – must show to what degree the 7 concepts overlap – perhaps there are other concepts that mean the same,
but have different names?
Step #1
8
Build a reference model
• Model central concepts – entities and their properties – type hierarchy for both – independent of any specific system; model the
organization's understanding
• This is not a canonical data model – not a format – not forcing any systems to actually use the model – systems can use entities/properties which do not
(yet) exist in the model
Step #1
9
What is cohabitation?
• Law on individual pensions – LOV-2008-06-27-62, § 3-7
• Med samboer forstås her a) person som kunden har felles bolig og felles barn med, b) person som kunden lever sammen med i ekteskaps- eller partnerskapslignende forhold når det godtgjøres at forholdet har bestått uavbrutt i de siste fem år før kundens død, og det ikke forelå forhold som ville hindre at lovlig ekteskap eller registrert partnerskap ble inngått.
• Regulation on collecting information – FOR-2005-07-08-826, punkt 1
• samboere: personer som lever sammen og har felles barn.
• Law on "vergemål" – LOV-2010-03-26-9, § 2
• Med samboere menes i denne loven to personer som bor sammen i et ekteskapslignende forhold.
• ...
Step #1
10
Modelling Step #1
Cohabita<on
Cohabita<on PENSION
Cohabita<on INNH
Cohabita<on VERGE
Cohabita<on XXX
Cohabita<on YYY
11
Properties, too Step #1
Cohabita<on Person
cohab 1
cohab 2 start date end date ...
12
Connect the system data
App #1 App #2 App #3
COHAB COHABITERS BPG_COHAB
Step #1
Cohabita<on
Cohabita<on PENSION
Cohabita<on INNH
Cohabita<on VERGE
Cohabita<on XXX
Cohabita<on YYY
13
Degrees of correlation
• Perfect correlation – not unusual, neither is it always the case
• Specialization – that is, B is a narrower concept than A is
• Overlap – A and B share a common subset
• Resembles – A and B are related, but the connection is not clear
A B
Step #1
14
Uses of the model
• We describe to understand – we want to understand so that we can improve
• Analysis of the architecture – starting point for a restructuring – identify master data issues – etc
• A data dictionary – useful when converting legacy data – useful for bug fixing – key personell no longer have to answer questions all the
time – etc
Step #1
15
From documentation to services
• So far we've only discussed documentation for humans – this is highly useful in a number of ways – but it is only the beginning
• The model has a semantic structure – therefore we can use it to build new kinds of services
Trinn #1
16
Master data control
• Pick one system to be the master for each kind of data – where this can really be centralized
• Other systems needing the data must become clients of the master – this is a gradual transition
• We also need a protocol which the clients can use to retrieve the data
Step #2
17
A service broker
• A service which routes requests
• A layer above the ESB – the ESB takes care of
transport – it might also broker between
several ESBs
• Uses its knowledge of information and services
• Decouples clients from servers
• Makes the architecture a lot more flexible
Broker Reference model
ESB
Step #2
18
Master data protocol
• Used by clients to retrieve data updates
• Makes it possible to gradually migrate
• Master can change without the clients knowing
Broker
App #1 App #2 App #3
App #4
Sync request: en<ty + <me
Referanse-‐modell
Lookup
Atom (SDshare)
Step #2
19
Collect reference data
• Most systems share a number of fairly static lists – list of countries, list of diagnosis codes, list of
provinces, ...
• There is no reason to maintain this in duplicate in different systems – the lists also need common identifiers for the items
• The lists might as well go into the reference model – can be retrieved from there by client systems
Step #3
20
Generic lookup service
• Would not work "out of the box"
• Must be carefully set up so that it works
• Possible because this is a controlled environment
Broker
App #1
App #2 App #3 App #4
Service #1 Service #2 Service #3
Give me an en<ty of type X with ID 23414
Who has data about
X?
Step #4
21
Generic translation service
• Sometimes the client wants format X, but the server can only supply Y – the broker can find a translator service, and – ensure that the translation happens automatically
X-‐>Y Y-‐>Z X-‐>Z
Y-‐>X Z-‐>Y Z-‐>X
Broker
Step #4
22
Some science fiction
• We already have – the structure of XML formats X and Y described, and – connected to the reference model
• In some cases we can then generate the translator automatically – made prototype in 2004 – it worked! – but it won't always work
Step #4
23
Impact analysis
• If we register clients and their requests in the model we know more about the uses of the architecture
• It becomes possible to find the answer to questions like – "can we stop this service?" – "does anyone use this format?" – ...
Step #4
24
Access control
• Rules for this are usually – not documented anywhere, – encoded in software all over the enterprise
• It can also be represented in the model – user groups can be connected to the data they are
allowed to access/modify – there is no need to represent individual users
• All this can be retrieved via web services
Step #5
25
Generic querying with SPARQL
• More advanced lookup – using SPARQL as the query language – can do more than just looking up IDs – doing queries "into the cloud"
• The reference model is used to interpret the query – splits it up and delegates to different services – the broker then assembles the result
• SPARQL is not a very powerful language – that is why this is possible
Trinn #6
26
Semantic formats "on the wire"
• Ordinary XML formats are static – semantic formats are dynamic
• New fields and entity types can be added – without changing the format – without confusing recipients
• Transparent support for subtyping – allowing even more flexibility in interpretation
• Support for merging – again transparent for recipients
Step #7
27
Scenario
App #1 Need all X sa<sfying certain criteria, in format Y
Broker
SPARQL, formatY
App #2
Database
SPARQL
XTM SPARQL
XTM
Merge and filter
Translator
XTM
formatY
Step #7
Service #1
Service #2
28
Conclusion
• Clients don't need to know where the data are – much looser coupling – much easier to restructure
• Clients do not need to relate to the many data models that are in use – instead they need only refer to the reference model
• Clients do not need to worry about the data formats used by servers – instead, they simply ask for data in the format they
want