What Next for DBAs in the Big Data Era - AIOUG Next... · •Hadoop Basics •NoSQL Databases...

What Next for DBAs in the Big Data Era

February 21st , 2015

Satyendra Kumar Pasalapudi

Associate Practice Director – IMS @ Apps Associates

Co Founder & President of AIOUG

@pasalapudi

Agenda

• Technology Trends

• Big Data Overview

• Hadoop Basics

• NoSQL Databases

• Big Data Sql

• What Next for DBAs

Cost effectively manage

and analyze

all available data in its

native form

unstructured,

structured, streaming

ERP CRM

Website

Network Switches

Social Media

Billing

Big data Challenge

Trend 1 – ‘The end of “one size fits all”

History of databases Magnetic tape

“flat” (sequential) files

Pre-computer technologies:

Printing press Dewey decimal system Punched cards

Magnetic Disk

Relational Model defined

Indexed-Sequential Access Mechanism (ISAM)

Network Model

ADABAS

System R

Oracle V2

Ingres

Informix

Sybase

SQL Server

Access

Postgres

Cassandra

Hadoop

Vertica

Dynamo

MongoDB

VoltDB

Aerospike

Hierarchical model

1960-70 1940-50 1950-60 1970-80 1980-90 1990-2000

2000-2010

• 3rd Platform drives

new demands on

the database:

– Global High

Availability

– Data volumes

– Unstructured data

– Transaction rates

– Latency

• A single

architecture cannot

meet all those

demands

Operational RDBMS

(Oracle, SQL Server, …)

In-memory Analytics (HANA,

Exalytics …)

In-memory processing

(Spark)

Hadoop

Web DBMS (MySQL, Mongo,

Cassandra)

ERP & in-house CRM

Analytic/BI software

(SAS, Tableau

Web Server Data

Warehouse RDBMS

(Oracle, Terradata …)

Enterprise Big data Architecture

Oracle Engineered Systems

Trend #2: Big Data and Hadoop

Biggest IT inflection point in our generation

Cloud Mobile

Social Big

Characteristics of Big Data

The instrumented human

• Bluetooth Personal Area Network

• 3G/WiFi Wide Area Network

• GPS

• Storage

• Pulse, temp monitor

• Silent alarms

• Pedometer, sleep monitoring

• Compass

• Camera

• Mike/earphones

• Heads up display

• Emotion/Attention monitor

Operational vs. Analytical Databases

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture (circa 2005)

Start Reduce Map Map

Map Map

Map Reduce

Hadoop Design Principles

• System shall manage and heal itself

– Automatically and transparently route around failure

– Speculatively execute redundant tasks if certain nodes are detected to be slow

• Performance shall scale linearly

– Proportional change in capacity with resource change

• Compute should move to data

– Lower latency, lower bandwidth

• Simple core, modular and extensible

Hadoop History

• Dec 2004 – Google GFS paper published

• July 2005 – Nutch uses MapReduce

• Feb 2006 – Starts as a Lucene subproject

• Apr 2007 – Yahoo! on 1000-node cluster

• Jan 2008 – An Apache Top Level Project

• Jul 2008 – A 4000 node test cluster

• May 2009 – Hadoop sorts Petabyte in 17 hours

Hadoop Ecosystem

HDFS (Hadoop Distributed File System)

HBase (key-value store)

MapReduce (Job Scheduling/Execution System)

Data Access

Sqoop Flume

Client Access

Hue Hive(Sql)

Pig(Pl/Sql)

(Streaming/Pipes APIs)

Data Mining

Mahout

OS – Redhat, Suse, Ubuntu,Windows

Commodity Hardware

Java Virtual Machine

Networking

Orchestration

HDFS Distributions

Hadoop 2.0

Hadoop at Yahoo

• 2010(biggest cluster):

• 4000 nodes 16PB disk

• 64 TB of RAM

• 32,000 Cores

• 2014:

– 16 Clusters

– 32,500 nodes

Oracle Big Data with Oracle Exadata

Trend #3: NoSQL

Database Market Disruption

$30B Database Market Being Disrupted

Operational vs. Analytical Databases

Name Site Counter

Dick Ebay 507,018

Dick Google 690,414

Jane Google 716,426

Dick Facebook 723,649

Jane Facebook 643,261

Jane ILoveLarry.com 856,767

Dick MadBillFans.com 675,230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarry.com

5 MadBillFans.com

NameId SiteId Counter

1 1 507,018

1 3 690,414

2 3 716,426

1 3 723,649

2 3 643,261

2 4 856,767

1 5 675,230

Id Name Ebay Google Facebook (other columns) MadBillFans.com

1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230

Id Name Google Facebook (other columns) ILoveLarry.com

2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767

BigTable Data Model

Financial services Discover fraud patterns based on multi-years worth of credit card transactions and in a time scale that does not allow new patterns to accumulate significant losses. Measure transaction processing latency across many business processes by processing and correlating system log data.

Internet retailer Discover fraud patterns in Internet retailing by mining Web click logs. Assess risk by product type and session/Internet Protocol (IP) address activity.

Retailers Perform sentiment analysis by analyzing social media data.

Drug discovery Perform large-scale text analytics on publicly available information sources.

Healthcare Analyze medical insurance claims data for financial analysis, fraud detection, and preferred patient treatment plans. Analyze patient electronic health records for evaluation of patient care regimes and drug safety.

Mobile telecom Discover mobile phone churn patterns based on analysis of CDRs and correlation with activity in subscribers’ networks of callers.

IT technical support Perform large-scale text analytics on help desk support data and publicly available support forums to correlate system failures with known problems.

Scientific research Analyze scientific data to extract features (e.g., identify celestial objects from telescope imagery).

Internet travel Improve product ranking (e.g., of hotels) by analysis of multi-years worth of Web click logs.

Big Data /Hadoop Use Cases

Document databases

• Structured documents – XML and JSON (JavaScript Object Notation) become more prevalent within applications

• Web programmers start storing these in BLOBS in MySQL

• Emergence of XML and JSON databases

Graph Database

Infinite Graph

FlockDB

Document

JSON based

MongoDB

CouchDB

RethinkDB

XML based

MarkLogic

BerkeleyDB XML

Key Value

MemchacheDB

Oracle NoSQL

Dynamo

Voldemort

DynamoDB

Table Based BigTable

Cassandra

HyperTable

Accumulo

How Do You Take This Growth?

Scaling Out RDBMS

RDBMS are Not Enough?

NoSQL Technology Scales Out

A New Technology

Use Cases

Brewer's CAP Theorem

NoSQL Technology Spectrum

No Means Yes!

Big Data Architecture

D A T A

S O U R C E S

DATA LAKE – On AWS Big Data Infra (Optrion2)

DATA CONNECTORS

A N A L Y T I C S

DATA LAKE on Oracle Big data Appliance (Option1)

DATA LAKE – On Premise Hadoop Infra(Option3) D A T A L A K E

On Premise Hadoop as RDBMS “active archive”

SALES 2013

Oracle Database

Structured Data Analytics from Apps

SALES 2012

SALES 2011

SALES 2010

SALES 2011

SALES 2010

“Hive” provides an SQL-like query layer over Hadoop and MapReduce

Unstructured + Structured Data Analytics from Apps

Hadoop for Structured Archive and Unstructured data

AWS EMR as RDBMS “active archive”

SALES 2013

Oracle Database

Structured Data Analytics from Apps

SALES 2012

SALES 2011

SALES 2010

SALES 2011

SALES 2010

“Hive” provides an SQL-like query layer over Amazon EMR

Unstructured + Structured Data Analytics from Apps

AWS EMR for Structured Archive and Unstructured data

Amazon Elastic MapReduce (Amazon EMR)

Oracle Database Support for All Data

• Structured Data • Numeric, String, Date, …

• Row and column formats

• Unstructured Data • LOB

• Text

• XML

• JSON

• Spatial

• Graph

Run the Business Scale-out and scale-up

Collect any data

Transactional and analytic

applications for the enterprise

Secure and highly available

Relational

Oracle Support for Any Data Management System

Hadoop

Change the Business

Scale-out, low cost store

Collect any data

Map-reduce, SQL

Analytic applications

Scale the Business

Scale-out, low cost store

Collect key-value data

Find data by key

Web applications

Big Data SQL

SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;

Relevant SQL runs on BDA nodes

10’s of Gigabytes of Data

Only columns and rows needed to answer query are returned

Hadoop Cluster

Big Data SQL

Oracle Database

CUSTOMERS WEB_LOGS

SQL Push Down in Big Data SQL

• Hadoop Scans on Unstructured Data • WHERE Clause Evaluation • Column Projection • Bloom Filters for Better Join Performance • JSON Parsing, Data Mining Model Evaluation

Data Analytics Challenge

Separate silos of information to analyze

Separate data access interfaces

SQL on Hadoop is Obvious

Oracle Confidential – Internal/Restricted/Highly Restricted

Stinger

No comprehensive SQL interface across Oracle, Hadoop and NoSQL

Oracle Big Data Management System

Rich, comprehensive SQL access to all enterprise data

Before After

What Does Unified Query Mean for You?

Data Science

Anyone

Before After

What Does Unified Query Mean for You?

Application Development

Storage Layer

Big Data SQL : A New Hadoop Processing Engine

Filesystem (HDFS) NoSQL Databases

(Oracle NoSQL DB, Hbase)

Resource Management (YARN, cgroups)

Processing Layer MapReduc

e and Hive

Spark Impala Search Big Data

What Next for DBA’s in Big Data Era? NoSQL Hadoop Big data Sql 12c New Features on Big data Engineered Systems Knowledge

Connect with Us

Web: www.appsassociates.com

Email: satyendra.pasalapudi@appsassociates.com | satyendra.kumar@aioug.org

YouTube: www.youtube.com/user/AppsAssociates

LinkedIn: www.us.linkedin.com/company/apps-associates

Twitter: @AppsAssociates

Facebook: www.facebook.com/AppsAssociatesGlobal

Thank You! @pasalapudi

What Next for DBAs in the Big Data Era - AIOUG Next... · •Hadoop Basics •NoSQL Databases...

Documents

A DBAs Toolbox - DOAG

Oracle IaaS Overview - AIOUG Hyderabad Chapter

Essential Linux Commands for DBAs

Kafka for DBAs

Role of DBAs in CLOUD ERA - AIOUG Hyd Chapter - Oracle Cloud Day

Apache Hive for modern DBAs

A PostgreSQL DBAs Toolbelt · 2017. 3. 26. · A PostgreSQL DBAs Toolbelt Kaarel Moppel 23.03.2017 Kaarel Moppel 23.03.2017. Fields of interest for DBAs ... wal-e-cloudoriented

Aioug big data and hadoop

Managing & Troubleshooting Cluster - 360 degrees - aioug

TCPIP Networks for DBAs

Oracle GoldenGate for Oracle DBAs

Dmz aa aioug

Mercado de Trabalho para DBAs

Exadata for Oracle DBAs

Big Data for Oracle Dbas

MySQL e Oracle para DBAs

Useful Unix Commands for DBAs

Oracle for SQL Server DBAs

Hadoop databases for oracle DBAs

Storage Latency for Oracle DBAs