Semantic Media Wiki Approach To Metadata

Preview:

DESCRIPTION

Presentation given at EDW 2012 (Atlanta)

Citation preview

Semantic MediaWiki Approach to Metadata

Scott E. Thompson Manager - Data ArchitectureOntario Teachers’ Pension Plan

2

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Agenda

Why?Mashup of slides I’ve used before…–What is Semantic MediaWiki?–Proof of Concept–The Unexpected

Wrap UpQuestions

3

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

pinterest.com/thompland777

SELECT ?Person WHERE { ?Person :hasExperience :Semantic Technologies .

?Person :hasExperience :Meta Data.?Person :hasExperience :Capital Markets }

4

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Ontario Teachers’ Pension Plan

Fixed IncomePublic EquitiesPrivate CapitalReal EstateInfrastructureForeign CurrencyCommoditiesHedge Funds

5

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

The Challenge: Metadata

6

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Current: Low Confidence

IT

ETL CorrectTrade

ReloadData

RerunReport

Data Warehouse Data Warehouse ReloadReload

42?

7

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Future: Nirvana

8

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Business Requirements

Findability of Data Ownership of Data Data Quality Consistent Business Terminology

Added later… Ownership of Metadata Metadata Quality

9

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Business Requirements

• Allows business users / end users to gain the required insight into what the data and reports they are looking at means

• Makes data available and visible to others

• Creates a searchable set of information about the firm’s data. This allows data developers and users to search for existing data and avoid data duplication.

• Provides a platform for sharing and publicizing data. This reduces the workload of developers (interfaces, reports, etc.) and users and increases efficiency.

• Quality control, data restrictions and uses can be applied to the entire data set.

• Metadata documentation transcends people and time. Staff turnover and balancing of multiple projects can be mitigated with metadata, providing data permanence and the documentation of institutional knowledge.

Value of Meta Data & Meta Data Tool

10

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

MDM?

MDM could stand for Master Data Management or Meta Data Management… coincidence?

“Lets go get all the key pieces of data and put them in one place, which is really more of an enterprise data warehouse but master data management then says… it’s almost a map… here is what each of those data fields are, here is how you can find them, here is what they mean, here is where they came from.”

Blake Johnson Consulting Professor Stanford University“The Truth and Power of Master Data Management” (Teradata)http://www.youtube.com/watch?feature=player_embedded&v=p6VHpIlDfu4#!

11

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Investment Strategy & Planning

SecuritiesOperations

PortfolioAccounting

PerformanceMarket

RiskManagement

Credit & CounterpartyRisk

Management

LiquidityRisk

ManagementCompliance

Collateral &CashManagement

PortfolioResearch &Analytics

Post-tradePre-Trade

Total Fund Reporting

Trades

Market Context

Model

Trades

Market Context

ModelTrades

Market Context

Model

Reconciliation

V = f(trade, market context, model, business context)

Trade & Deal

Management

Business Context

Business Context

Business Context

One Truth?

12

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

What is a Wiki?

Hawaiian for “quick”Allows large numbers of people to create and edit the same contentEffective for reaching a credible consensus from a large groupWikipedia is the world’s largest collaboratively edited source of encyclopedic knowledge

13

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

What is the Semantic Web?

14

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

MediaWiki (Web 2.0)

15

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Semantic MediaWiki (Web 3.0)

16

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Future Opportunities

Simple search algorithms would suffice to provide a precise answer to the question…

17

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Faceted Search

18

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Graphs (relate/infer)

otpp:Debtotpp:Index-Linked Bond

otpp:Debtotpp:Fixed-Rate Bond

otpp:Amortizing Index-Linked Bond

otpp:Index-Linked BondsubClassOf

subClassOf

subClassOf

otpp:subtypeOf

dbpedia:Inflation Linked Bond

<sameAs>otpp:Index-Linked Bond

sameAs

dbpedia:Inflation-Linked Bond

19

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Who Needs Consistency?

20

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Linked Open Data Graph (OLD)

21

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

FIBO

22

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Proof of Concept

Build a knowledgebase about:Our structured data (schemas, tables, columns)Our business terminology (business process, products, attributes)

Prove that the technology could: • Automatically load technical metadata

and relate it with business metadata• Customize workflow to collect and

govern the manual business input

23

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Data Architecture Ontology

Table

Schema

Schema Group

IsPartOfA

BelongsToA

Instances:TOOLKITCOREPRODUCTFUNCTIONALBUAD

Instances:ACCTMREFMKTFIQR

Instances:Table1Table2View1View2

24

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

25

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Data Management Ontology

Table

Quality StateOrganizational

Group

hasDataOwnerhasA

Instances:UserAuthoratative

Instances:Investment Division – Asset Mix & RiskFinance Division – Data Management

SLA

Instances:SLA1SLA2

hasDataSteward

hasA

26

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

27

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

28

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

29

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

30

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Workflow

31

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

32

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Product Attribute Ontology

Product Group Stored Procedure

Table

CallsA

ReferencesA

Product Attribute

Quality Test

hasDMQual ityTest

Instances :MissingStaleNull ValueComparativeToleranceChanged

Product

belongsToA

Column

hasA

getsDataFromhasAttribute

Focus on this data entry form

Metadata to be curated by DM

Metadata to be curated by AM &R

33

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

% Sourced from Core Schemas?

{{#sparql: SELECT DISTINCT ?Product ?Product_attribute ?Column ?Schema WHERE { ?Product property:HasAttribute ?Product_Attribute . ?Product_attribute property:GetsDataFrom ?Column . ?Column MDM:belongsToSchema ?Schema . } |merge=true|link=all}}

34

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

Data Management Indexes

35

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

36

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

It’s a New Kind of Database!

37

1. Why? 2. SMW? 3. The PoC 4. The Unexpected 5. Wrap Up

WYSIWYG extensionEnhanced Retrieval ExtensionDeployment Framework

MediaWikiSemantic MediaWiki

Web Server

SMW+ in a nutshell

“The smartest organizations are not those with the smartest people but those with the quickest access to their collective knowledge”

- Rod Collins (wiki-management.com)