8
( Big ) Data Management Storage Global Concepts in 5 slides 2016 Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

A big data data storage global concepts in 5 slides

Embed Size (px)

Citation preview

( Big ) Data Management

Storage

Global Concepts in 5 slides 2016

Nicolas SARRAMAGNA

https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

CONTENTS

Introduction

What / Why

How

References

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage in Data Management 3

DATA MANAGEMENT Multiples modules

BIG DATA Velocity, Volume, Variety, Veracity, Value

Collect

Storage

Data Mining /

Machine Learning

Data Viz

Governance

Security

Master Data

Data quality

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage – What / Why 4

STORAGE = ABILITIES TO Store all type of data

structured data (like RDBMS)

semi-structured (xml, json formats, mails, html pages, logs, sensors data)

unstructured data (text, files, videos, images)

Volume : storage of Tera, Peta octets

Enrich and categorize data with metadata

STORAGE = ALLOW TO Cross data exploration, to do analysis and data mining -> new insights, break silos

Deliver Business data as self-service

Relieve RDBMS and DataWarehouse of cold data and binary data

Support RDBMS and DataWarehouse as a staging area for unstructured content

STORAGE = USAGE OF A DATA LAKE Complete the architecture of the data, not replace it

Large-scale storage repository

Volume - Variety

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage – Data lake location

5

Existing tools & solutions

Complete the architecture

Traditional

Operational

Data Sources

(structured data)

Data

Warehouse

Dedicated for specific needs :

analysis, performance, security

DataMart

DataMart

DataMart

DataMart

Data Lake

- unstructured

- semi-structured

- structured

- metadata

DataMart

DataMart

DataMart Feed Archive

query

Feed,

Bind,

Archive

query

query

Feed Feed

Feed

(push / pull)

Data sources not yet

used for BI

Schema on write

Schema on read

RDBMS

RDBMS / NoSql

New business

applications

dashboards

dashboards

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage – How – Technical perspective

6

USAGE OF HADOOP Standard product to put in place a data lake

HADOOP Distributed File System (cluster) + parallel processing engine + additional tools (cf image below)

Not use collection of servers but collection of co-located cpu, ram and local disks : commodity hardware, low cost

Horizontal elasticity : master node / data nodes architecture

Shared nothing : when a node breaks down, no data is lost. Each data node is independent.

Design for failure : when a node breaks down, the cluster continues to work

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage – How – Functional perspective

7

MANAGE THE DATA LAKE NOT TO BE A SWAMP Start small and smart, do not bring everything

Classification of data (by metadata) is a mandatory

Querying data available for all layers

Integrated Data

ILM Information Lifecycle Management

Metadata

Governance & Security

query

query

Qualified Data Collaborative Data

Data Lake

Raw data

Operational metadata Contextual medata

Data quality

Operational Reports

Aggregated, summarized,

classification data

BI Self Service

query

COMPAGNIE PLASTIC OMNIUM

CONFIDENTIAL

Storage - References 8

REFERENCES

https://www.linkedin.com/pulse/how-create-data-lake-vivek-kumar-singh?trk=pulse-det-nav_art

http://fr.slideshare.net/mrm0/building-the-enterprise-data-lake-a-look-at-architecture

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html

http://fr.slideshare.net/CasertaConcepts/hadoop-and-your-data-warehouse

https://www.safaribooksonline.com/library/view/hdinsight-essentials-/9781784399429/ch05s06.html