30
Apache Atlas & Open Metadata Dataworks Sydney 2017 Nigel Jones, Software Architect IBM Ferd Scheepers Chief Information Architect ING

Open Metadata and Governance with Apache Atlas

Embed Size (px)

Citation preview

Apache Atlas & Open Metadata

Dataworks Sydney 2017

Nigel Jones,

Software Architect

IBM

Ferd Scheepers

Chief Information Architect

ING

2

Open Metadata and Governance will allow…

… metadata to be captured when the data is created, moved with the data and

be augmented and processed by any of the vendor tools.

Open Metadata and Governance consists of:

1. Standardized, extensible set of metadata types

2. Metadata exchange APIs and notifications

3. Frameworks for automated governance

Open Metadata and Governance will allow you to have:

1. An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage),

owner, structure, meaning, classification and quality

2. New data tools (from any vendor) connect to your data catalogue out of the box

3. Metadata being added automatically to the catalogue as new data is created and analysed

4. Subject matter experts collaborating around the data

5. Automated governance processes protect and manage your data

3

What is Open Metadata and Governance?

4

Positioning of Apache Atlas for Open Metadata

Open andUnified Metadata

Metadata repository

Apache Atlas

Metadata repository

IBM

Metadata repository

SAS

Open Metadata Repository ServiceOMRS

Open Metadata Access ServiceOMAS

Components defined

and being developed

by Open Metadata &

Governance project

Metadata

highway

• Apache Atlas provides an open community for developing the reference implementation

for open metadata and governance. In essence Apache Atlas delivers 2 main

capabilities:

• it plays a role of a metadata repository (Graph Database) for a metadata end-user tool

• and, it plays the important role of delivering the federated/unified metadata layer

across the entire landscape of an enterprise

• The software development governance from the Apache Software Foundation (ASF)

creates confidence that the technology will be maintained and enhanced as appropriate

in an equitable manner.

Role of Apache Atlas

5

… because Apache is mostly focused on development and we are missing a governance

body for managing the adoption of and compliance to the Open Metadata and Governance

standards. We envision the following roles for ODPI:

1. Be an advocate of the Open Metadata and Governance standards, make them visible

and their value understood.

2. Facilitate discussions around the Open Metadata and Governance standards evolution,

maintenance and development.

3. Test and sign-off compliance of vendor offerings to the Open Metadata and Governance

standards.

6

Doing all of this under Apache Atlas flag is not enough…

1. Hands-on Community members:

• ING

• IBM

• HortonWorks

2. Companies we have had conversations with:

• CIBC

• SAS

• Microsoft

• Oracle

• Informatica

• Waterline

• RBC

• DBS

7

Who is in ?

1. Ambition level:

• End of September 2017: Open Metadata working demo.

• Mid-December November 2017: first version of user access.

• Google for Data

2. Next steps:

• End of Q2 2018: production ready version of Virtual Data

Connector.

8

Timeline and next steps

About Me

https://www.linkedin.com/in/nigelljones

https://www.twitter.com/planetf1

[email protected]@

Objective

Why

How

Excite & Engage

Apache Atlas

Open

Metadata

Atlas has graduated!

DOB: 2015-05-05R: 0.8.1

Atlas Architecture

Storage Repository

Graph

Type System

REST API

Models

UI & Apps

Hooks &

Bridges

https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance

A reminder of our problem.. And solution

Open andUnified Metadata

Extend beyond Hadoop

++

Common Core Data model

Data Assets Governance Lineage

Glossary CollaborationModels & Reference

Data

Base Types, Systems &

Infrastructure

Metadata Discovery

https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem

Open APIs - OMRS

Metadata Highway

Adapter

Plugin

Open Connector

Framework

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258803

Open APIs - OMAS

OMRSGovernance

Engine OMAS

Glossary OMAS

Asset OMAS

Information View OMAS

++......

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799

OMAS – detail

Project ListMetadata Service

Data/Asset

Community Metadata Service

Landscape Definition Metadata Service

Asset CatalogMetadata Service

Classification and Mapping Metadata Service

Information View Metadata Service

Connector Directory Metadata Service

Governance Definitions Metadata Service

Information Process Metadata Service

Glossary and Taxonomy Metadata Service

AssetMetadata Service

DiscoveryMetadata Service

Governance ActionMetadata Service

Roles and AccessMetadata Service

Models and SchemaMetadata Service

Connector

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799

Businessmetadata

Structuralmetadata for

a data store

New glossary function for semantic processing

EMPNAME EMPNO JOBCODE SALARY

EMP

LOYE

E

REC

OR

D

Employee

Work Location

Annual Salary

Job Title

Employee IdEmployee Name

Hourly Pay RateManager Compensation Plan

HAS-A

HAS-A

HAS-AHAS-A

HAS-A

HAS-A

IS-A IS-A

SensitiveIS-A

Data00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3

https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary

Replacing v1 Taxonomy (tech preview)

Categories

Terms

hierarchies

Rich Relationships

Classifications

Glossary

https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary

Open Discovery Framework

Open Framework

Plugins characterize data & relationships

Updates metadata with results

Initial implementation in master

https://cwiki.apache.org/confluence/display/ATLAS/Automated+metadata+discovery

Governance Action Framework

metadata drives enforcement

Classification (tag) based – scalable, glossary driven

Access, Masking, Filtering

Supports Apache Ranger but open APIs for others

Audit,Rights - Exception management, Rights, Privacy (to look at in future)

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258801

Open ecosystem

https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance

Summary

Open Metadata

Enterprise Catalog

Discovery

Multi Vendor

Open, Layered

APIs

Metadata store

integration

Open Source &

Governance

ubiquitous

Standard Models

How can I get involved?

Discuss: Mailing List

Document, Explain: Wiki

Report, Design: Jira

Face to face

Code

Vendors!

https://cwiki.apache.org/confluence/display/ATLAS/Getting+Involved

Governance & Security BOF

Thursday 18:00

C4.7

Backup

VDC End to End

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69407333