23
© 2014 IBM Corporation Governance in a (Big) Data environment The Data Reservoir Johan Huizingalaan 765 1066 VH Amsterdam The Netherlands November 2014 Ron van der Starre, Information Architect, IBM Software Group

Met gecombineerde databronnen uw business laten groeien

Embed Size (px)

DESCRIPTION

Het betrouwbare Data Reservoir; hoe kunt u alle databronnen combineren om uw business te laten groeien?U heeft steeds meer data. Naast uw traditionele procesdata met BI- rapportages groeit de hoeveelheid semi-gestructureerde data, zoals machine-data en data uit externe bronnen. Wanneer u alle beschikbare data zou kunnen gebruiken in uw analyses leidt dit waarschijnlijk tot groei van uw business. Maar, dan moet u de resultaten wel kunnen vertrouwen.... In deze sessie vertellen we hoe een geïntegreerd data-platform ontstaat, gebaseerd op een gedegen en transparante `Information Supply Chain´. Dit platform bestaat uit `fit for purpose´ data repositories voor traditionele data-opslag en nieuwe vormen van data. Een heldere `Data Governance´ op basis van meta-data zal het vertrouwen in data verhogen en nieuwe inzichten onthullen. In deze sessie worden u handvaten aangereikt hoe u deze Information Supply Chain en het Data Reservoir in kunt richten.

Citation preview

Page 1: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation

Governance in a (Big) Data environment The Data Reservoir

Johan Huizingalaan 765

1066 VH Amsterdam

The Netherlands

November 2014

Ron van der Starre,

Information Architect,

IBM Software Group

Page 2: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 2

Most organizations struggle to get started with

(Big)Data

Most organizations do not have the in-house

expertise

Accelerating initial success and demonstrating

business value is key to gaining organizational

support

Page 3: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 3

The reality of information management lacking governance today

Page 4: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 4

Top sources of information used as part of initial efforts – typically start with data already being captured

Source: The real world use of Big Data, IBM & University of Oxford

Data sources

Respondents with active big data efforts were asked which data

sources are currently being collected and analyzed as part of active

big data efforts within their organization.

88%

73%

59%

57%

43%

42%

42%

41%

41%

40%

38%

34%

92%

81%

70%

65%

27%

19%

36%

47%

32%

0%

21%

22%

Transactions

Log Data

Events

Emails

Social Media

Sensors

External Feeds

RFID Scans or POS Data

Free-form Text

Geospatial

Audio

Still Images / Videos

Financial services

respondents

Global respondents

Page 5: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 5

Data blues & skills issues

A disproportionate portion of the time spent in analytics project is about data preparation: acquiring/preparing/formatting/normalizing the data

In addition to raw data, augmented data/analytical assets can significantly speed up the analytics process and partially bridge the talent gap

Page 6: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 6

A growing demand …

Business Teams want • Open access to more information • More powerful analysis and visualization tools

IT Teams are • Concerned about cost

• Concerned about governance and regulatory requirements

Page 7: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 7

How to manage your existing and new data initiatives

Refer to data as a company high valued asset

– Define a strategy on how to use data as a differentiator

– Embed the strategy firmly in the organization

Enable value of data by strong governance principles

– Strong empowerment by senior management (or CDO)

– Define clear and efficient procedures and policies to assure compliance to risk and security

requirements

Improve trust in, and understanding of data

– Define clear and agreed busines terms to data

– Profile data to assess and baseline data quality

– Clear reporting on progress and data quality initiatives

– Data quality as an every day job

Page 8: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 8

A business scenario - enhanced 360º view of the customer data

Behavioral data - Orders - Transactions - Payment history - Usage history

Descriptive data - Attributes - Characteristics - Relationships - Self-declared info - (Geo)demographics

Attitudinal data - Opinions - Preferences - Needs and Desires

Interaction data - Email / chat transcripts - Call center notes - Web Click-streams - In person dialogues

Who? What?

Why? How?

Page 9: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 9

Other business scenarios we see

Subject matter experts want access to their organization’s data to explore the content, select, control, annotate and access information using their terminology with an underpinning of protection and governance.

Data Scientist seeking data for new analytics models.

Marketeer seeking data for new campaigns.

Fraud investigator seeking data to understand the details of suspicious activity.

• Day-to-day activity. • Requiring ad hoc access to a

wide variety of data sources. • Supporting analysis and

decision making. • Using the subject matter

experts terminology.

Page 10: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 10

The Vision Statement for the Data Reservoir

Enable an organization to operate as one for all platforms, functions and clients to have an agile and self-service operating model with trust and confidence across traditional and new sources of data.

Enablers

1. Agile and self-service • Find Information • Access Information • Provision Information • Integrate (Cleanse, Transform, Enact, Match,

Enhance) • Project and enhance • Hypothesis Validation • Model and report generation • Archive / Remove / Revive • Refine • Curation

2. Trust and confidence • Information lifecycle and governance • Data quality • Reference data • Entity matching and resolution • Lineage/Provenance • Classification • Regulatory compliance reports

3. Traditional and new sources of information • New types of repositories, tools and processors • Heterogeneous information virtualization

Page 11: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 11

What is a Data Lake/Reservoir?

A Data reservoir is a data lake that provides data to an organization for a variety of analytics processing including:

– Discovery and exploration of data – Simple ad hoc analytics – Complex analysis for business decisions – Reporting – Real-time analytics

It is possible to deploy analytics into the data reservoir to generate additional insight from the data loaded into the data reservoir.

A data reservoir manages shared repositories of information for analytical purposes.

Each Data Reservoir Repository is optimized for a particular type of processing.

– Real-time analytics, deep analytics (such as data mining), exploratory analytics, OLAP, reporting, …

Data values may be replicated in multiple repositories in the data reservoir. However the data reservoir ensures the copying and updating of this data is managed and governed using well-defined information supply chains.

Information in the data reservoir can be accessed through different types of interfaces and provisioning mechanisms provided the Data Reservoir Services.

Data Reservoir

Information Management and Governance Fabric

Data Reservoir Services

Data Reservoir Repositories

Page 12: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 12

Governance - differing user perspectives

Data Stores

Curation of Metadata about Stores, Models, Definitions

Information Governance Catalogue

Search for, locate and download data and related artifacts.

Provision Sand Boxes.

Add additional insight into data sources through automated analysis.

Develop data management models and implementations.

Data Stores

Data Stores

Sand Box Define governance policies, rules

and classifications. Monitor compliance.

View lineage (business and technical) and perform impact analysis.

Page 13: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 13

Information Governance Overview

Governance Activity Type Information Exchange Role that technology can play

Communication Policies & Metrics Delivering education, best practices, assessments, templates.

Compliance Design Changes Implementing control points and enforcement points.

Support for design and code reviews.

Test Data Management.

Exception Exception Requests Exception process management, incident reporting.

Feedback Measurements Dashboards and reports on compliance.

Vitality New Requirements Change process management

Successful Information Governance is implemented with a combination of:

• Skilled people, correct roles and organization

• Processes that create a pragmatic, targeted and agile work environment.

• Standards, templates and assets that improve consistency between implementations.

• Technology that automates classification, enforcement validation, and correction of data.

Page 14: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 14

Policy

Three lifecycles of information governance

Policy Policy

Policy Operations Development

Metadata

Page 15: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 15

InfoSphere Integration and Governance features

Define terms and policies

Information Profiling

Job Creation

Data Modeling

Mapping

Metadata Exploration

Compliance Reports

Model information supply chains

Rule Definition

Rule Execution

Exception Management

Information Maintenance

Matching

De-duplication

Information Provisioning

Discovery of Structure

Lineage

Policy

Policy Policy

Policy Operations Development

Metadata

Page 16: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 16

Data Reservoir system context diagram

Line of Business

Applications

Decision Model

Management

Governance, Risk and

Compliance Team

Simple,

Ad Hoc

Discovery

and

Analytics

Reporting

Events to Evaluate

Information Service Calls

Data Feed Out

Data Feed In

Information Service Calls

Search Requests

Report Queries

Understand Information

Sources

Understand Information

Sources

Deploy Decision Models

Understand Compliance

Report Compliance

Information Service Calls

Data Export

Data Reservoir

Advertise Information

Source

Information

Curator

Information Federation

Calls

Enterprise IT

System of Record

Applications

Front Office

Applications

Back Office

Applications

En

terp

rise

Se

rvic

e B

us

New Sources

Third Party Feeds

Third Party Services

Support

Services

Mobile and other

Channels Deploy

Real-time Decision Models

Other

Data Reservoirs Other

Data Reservoirs

Inter-lake Exchange

Internal Sources

10001

01011

01101

Data Reservoir Operations

Curation Interaction

Management

Notifications

Data Export

Data Import

Data Import

Deploy Real-time

Decision Models

Page 17: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 17

Data Reservoir major subsystems

Line of Business

Applications

Decision Model

Management

Governance, Risk and

Compliance Team

Simple,

Ad Hoc

Discovery

and

Analytics

Reporting

Events to Evaluate

InformationService Calls

Data FeedOut

Data FeedIn

InformationService Calls

SearchRequests

ReportQueries

UnderstandInformation

Sources

UnderstandInformation

Sources

DeployDecisionModels

UnderstandCompliance

ReportCompliance

InformationService Calls

DataExport

Data Reservoir

Catalog

Interfaces

Advanced Data

Provisioning

AdvertiseInformation

Source

Information

Curator

InformationFederation

Calls

DeployReal-timeDecisionModels

DeployReal-timeDecisionModels

Other

Data ReservoirsOther

Data Lakes

Inter-lakeExchange

Data Refineries

AnalystInteraction

Data Reservoir Operations

CurationInteraction

Information Integration & GovernanceManagement

Notifications

DataExport

DataImport

DataImport

Data Reservoir

Repositories

Enterprise IT

System of RecordApplications

Front Of f ice

Applications

Back Off ice

Applications

Enterp

rise Service B

us

New Sources

Third Party Feeds

Third Party Services

Support

Services

Mobile and other Channels

Internal Sources

10001

0101101101

Page 18: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 18

Start small, think big …

Line of Business

Applications

Decision Model

Management

Governance, Risk and

Compliance Team

Simple,

Ad Hoc

Discovery

and

Analytics

Reporting

Events to Evaluate

InformationService Calls

Data FeedOut

Data FeedIn

InformationService Calls

SearchRequests

ReportQueries

UnderstandInformation

Sources

UnderstandInformation

Sources

DeployDecisionModels

UnderstandCompliance

ReportCompliance

InformationService Calls

DataExport

Data Reservoir

Catalog

Interfaces

Advanced Data

Provisioning

AdvertiseInformation

Source

Information

Curator

InformationFederation

Calls

DeployReal-timeDecisionModels

DeployReal-timeDecisionModels

Other

Data ReservoirsOther

Data Lakes

Inter-lakeExchange

Data Refineries

AnalystInteraction

Data Reservoir Operations

CurationInteraction

Information Integration & GovernanceManagement

Notifications

DataExport

DataImport

DataImport

Data Reservoir

Repositories

Enterprise IT

System of RecordApplications

Front Of f ice

Applications

Back Off ice

Applications

Enterp

rise Service B

us

New Sources

Third Party Feeds

Third Party Services

Support

Services

Mobile and other Channels

Internal Sources

10001

0101101101

Information

Integration &

Governance

Access

Analyst

Interaction

Harvested

Data

DEEP DATA

Descriptive

Data

INFORMATION

VIEWS

CATALOG

Information

Ingestion

Information

Access

INFORMATION

BROKER

OPERATIONAL

GOVERNANCE

HUB

STAGING AREAS

Find

Access

Front Office

Applications

Internal Sources

Simple,

Ad Hoc

Discovery

and

Analytics

INFORMATION WAREHOUSE

Example

Page 19: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 19

Tools to address practical challenges managing Big Data

InfoSphere BigInsights for Hadoop

• For data at rest

• 100% standard Hadoop

• IBM Big SQL, BigSheets

• Developer tools, Accelerators

• Ease of use for all roles

InfoSphere Information Server

• For all data integration data requirements

• Business driven Information Governance

Catalog

• Sustainable data quality

• Governance Dashboard

Page 20: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 20

Tools to address practical challenges managing Big Data

InfoSphere Watson Explorer

• Enterprise Search engine

• Discover, explore structured and

unstructured data

Page 21: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 21

Five key findings and key success criteria

Focus on how to generate increased customer insights

in support of an existing initiative 2

Delivering analytical insights faster is a differentiator

and provides business value 5

Start with existing sources of internal data that

must be captured and maintained anyway 1

Success depends upon a scalable and extensible platform,

with security and governance 4

Determine up front what KPIs you are trying to impact

and how you will deliver business value 3

Page 22: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 22

Governing and Managing Big Data for Analytics and Decision Makers

Line of Business

Applications

Decision Model

Management

Governance, Risk and

Compliance Team

Simple,

Ad Hoc

Discovery

and

Analytics

Reporting

Events to Evaluate

InformationService Calls

Data FeedOut

Data FeedIn

InformationService Calls

SearchRequests

ReportQueries

UnderstandInformation

Sources

UnderstandInformation

Sources

DeployDecisionModels

UnderstandCompliance

ReportCompliance

InformationService Calls

DataExport

Data Reservoir

AdvertiseInformation

Source

Information

Curator

InformationFederation

Calls

Enterprise IT

System of RecordApplications

Front Of f ice

Applications

Back Off ice

Applications

Enterprise S

ervice Bus

New Sources

Third Party Feeds

Third Party Services

Support

Services

Mobile and other Channels

DeployReal-timeDecisionModels

Other

Data ReservoirsOther

Data Lakes

Inter-lakeExchange

Internal Sources

10001

0101101101

Data Reservoir Operations

CurationInteraction

Management

Notifications

DataExport

DataImport

DataImport

DeployReal-timeDecisionModels

http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html?Open

Page 23: Met gecombineerde databronnen uw business laten groeien

© 2014 IBM Corporation 23