29
B IG D ATA M ODELING Hans Hultgren RMDC Fall 2016

Big Data Modeling

Embed Size (px)

Citation preview

Page 1: Big Data Modeling

BIG DATA MODELING

Hans Hultgren

RMDC Fall 2016

Page 2: Big Data Modeling

Welcome

Page 3: Big Data Modeling

• Big Data1

• Data Modeling2

• Big Data Modeling3

AGENDA

Page 4: Big Data Modeling

Session Objectives

• Big Data Fundamentals– Components of Big Data– Structure & Schemas– Tools & Architecture

• Data Modeling – Integration & History– Data Warehousing & BI– Conceptual to Physical

• Big Data Modeling– Focus on Meaning

• Ensemble Modeling– The Blended Architecture

Page 5: Big Data Modeling

BIG DATA

Page 6: Big Data Modeling

Big Data

“Huge” Data Volumes

n-Structured & Very Complex

Streaming & Shape-Shifting

Typical Data

v v

v v

v v

v v

Typical Data Big Data

A

B

C

Page 7: Big Data Modeling

Big Data

• VolumeHuge Volumes of Data

• VelocityDrinking from a Fire Hose

• Varietyn-Structured Data

• VeracityQuality, Accuracy, Reliability, Trustworthiness

• ValueBusiness Value and Value Potential

Page 8: Big Data Modeling

Big Data Architecture

• To deal with the features of Big Data, supporting architectural components are based on:

–Data distribution, and

– Late Binding of Schemas

KVP

Page 9: Big Data Modeling

Modeling and Understanding

• Schema on Write

• Schema on Read

• Dismantled Schema on Write

• Schema on Focus

• Schema on Leverage

9

LOAD

MODEL APPLYEXPLORE

Page 10: Big Data Modeling

Modeling and Understanding

• Big Data

Possibilities

10

LOADMODEL APPLY

EXPLORE

Page 11: Big Data Modeling

Inconvenient Truth about BIG DATA

http://community.embarcadero.com/blogs/entry/the-hidden-elephant-in-big-data-modeling

Page 12: Big Data Modeling

DATA MODELING

Page 13: Big Data Modeling

Data Modeling

Mans Search for Meaning…

• Conceptual Modeling

• Logical Modeling

• Information Modeling

• Physical Data Modeling

Page 14: Big Data Modeling

Ensemble Modeling™

14

All the parts of a thing taken together, so that

each part is considered only in relation to the whole.

• The constellation of component parts acts as a whole.

• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts. An Ensemble is typically based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept.

EMF

Page 15: Big Data Modeling

Forms of Modeling & Ensemble

15

Ensemble

Anchor Focal Point Data Vault

DV2.02G

Hyper AgilityTemporal6NF, etc.

Matter

EDW

DataMart

DataMart

DataMart

ERP

Acctg

Sales

3NF Dimensional

E M F

Page 16: Big Data Modeling

The Data Vault Ensemble

16

• The Data Vault Ensemble conforms to a single key – embodied in the Hub construct.

• The component parts for the Data Vault Ensemble include:

– Hub The Natural Business Key

– Link The Natural Business Relationships

– Satellite All Context, Descriptive Data and History

Page 17: Big Data Modeling

Ensemble means thinking differently

17

Customer

Customer

• The minimal construct then for an “entity”

such as “Customer” is now (in data vault) a

Hub with a set of Satellites

Page 18: Big Data Modeling

Applying data vault modeling pattern

18

Page 19: Big Data Modeling

Data Vault Ensemble Modeling Process

1) Identify and Model the Core Business Concepts

• Business Interviews is at the heart of this step

What do you do? What are the main things you work with?

• Find best/target Natural Business Key19

Page 20: Big Data Modeling

Data Vault Ensemble Modeling Process

2) Identify and Model the Natural Business Relationships

• Specific Unique Relationships

• Be considerate of the Unit of Work and Grain

20

Page 21: Big Data Modeling

Data Vault Ensemble Modeling Process

3) Analyze and Design the Context Satellites

• Consider Rate of Change, Type of Data and also the Sources

21

Page 22: Big Data Modeling

BIG DATA

MODELING

Page 23: Big Data Modeling

Logical business model

• Leveraged for all logical

model needs including

the data warehouse, big

data lake, master data

management (MDM) and

operational integration

initiatives

• Closely aligned to DV

physical model

Ensemble Logical Form ( )

23

Customer

Region Store

Sale

Vendor

Product

Sale LI

Employee

Customer

RegionStore

Sale

Vendor

Product

Sale LI

Employee

CustomerRegion

Store

Sale

Vendor

Product

Sale LI

Employee

Page 24: Big Data Modeling

Ensemble Logical Form

24

CustomerRegion

Store

Sale

Vendor

Product

Sale LI

Employee

ELF Modeling maintained in:

* Metadata

* Logical Data Model

* Data Modeling Tools

* Virtual Schemas

* Other Tools or Artifacts

Map to Context Data stored in:

* JSON Docs

* XML (w/ XSD or Not)

* Blobs (Free Form Text)

* Big Data Platforms

* Hadoop

* In the Cloud

Page 25: Big Data Modeling

Three Paths for Modeling

Structured / Known

• CBC

• NBR

• Attribution

• Columns

Results in a backbone model with attributes in defined columns

N-Structured / NVP

• CBC

• NBR

• Attribution

Results in a backbone modes with known/expected attribute names/tags

N-Structured / KVP

• CBC

• NBR

Results in a backbone model with capacity to capture unknown attribution either named/tagged or not

Page 26: Big Data Modeling

APPLYING THE ENSEMBLE

Integration

across

Platforms

Page 27: Big Data Modeling

Expanded Applications

CustomerRegion

Store

Sale

Vendor

Product

Sale LI

Employee

Page 28: Big Data Modeling

Summary

Ensemble in the Big Data World

• Conceptual Modeling

• Logical Modeling

• Information Modeling

• Physical Data Modeling

• Integration Platform

+++-+ + +